Unit 1
Unit 1
Introduction to big data – convergence of key trends – unstructured data – industry examples of
big data – web analytics – big data applications– big data technologies – introduction to Hadoop
– open source technologies – cloud and big data – mobile business intelligence – Crowd sourcing
analytics – inter and trans firewall analytics.
Data which are very large in size is called Big Data. Normally we work on data of size
MB(WordDoc ,Excel) or maximum GB(Movies, Codes) but data in Peta bytes i.e. 10^15
byte size is called Big Data. It is stated that almost 90% of today's data has been
generated in the past 3 years.
o Social networking sites: Facebook, Google, LinkedIn all these sites generates huge
amount of data on a day to day basis as they have billions of users worldwide.
o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs
from which users buying trends can be traced.
o Weather Station: All the weather station and satellite gives very huge data which are
stored and manipulated to forecast weather.
o Telecom company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.
o Share Market: Stock exchange across the world generates huge amount of data through
its daily transaction.
Companies use big data in their systems to improve operations, provide better customer service,
create personalized marketing campaigns and take other actions that, ultimately, can increase
revenue and profits. Businesses that use it effectively hold a potential competitive advantage
over those that don't because they're able to make faster and more informed business decisions.
For example, big data provides valuable insights into customers that companies can use to refine
their marketing, advertising and promotions in order to increase customer engagement and
conversion rates. Both historical and real-time data can be analyzed to assess the evolving
preferences of consumers or corporate buyers, enabling businesses to become more responsive to
customer wants and needs.
Big data is also used by medical researchers to identify disease signs and risk factors and by
doctors to help diagnose illnesses and medical conditions in patients. In addition, a combination
of data from electronic health records, social media sites, the web and other sources gives
healthcare organizations and government agencies up-to-date information on infectious disease
threats or outbreaks.
Here are some more examples of how big data is used by organizations:
In the energy industry, big data helps oil and gas companies identify potential drilling
locations and monitor pipeline operations; likewise, utilities use it to track electrical grids.
Financial services firms use big data systems for risk management and real-time analysis of
market data.
Manufacturers and transportation companies rely on big data to manage their supply chains
and optimize delivery routes.
Other government uses include emergency response, crime prevention and smart city
initiatives.
What are examples of big data?
Big data comes from myriad sources -- some examples are transaction processing systems,
customer databases, documents, emails, medical records, internet clickstream logs, mobile apps
and social networks. It also includes machine-generated data, such as network and server log
files and data from sensors on manufacturing machines, industrial equipment and internet of
things devices.
In addition to data from internal systems, big data environments often incorporate external data
on consumers, financial markets, weather and traffic conditions, geographic information,
scientific research and more. Images, videos and audio files are forms of big data, too, and many
big data applications involve streaming data that is processed and collected on a continual basis.
Volume is the most commonly cited characteristic of big data. A big data environment doesn't
have to contain a large amount of data, but most do because of the nature of the data being
collected and stored in them. Clickstreams, system logs and stream processing systems are
among the sources that typically produce massive volumes of data on an ongoing basis.
Big data also encompasses a wide variety of data types, including the following:
semistructured data, such as web server logs and streaming data from sensors.
Various data types may need to be stored and managed together in big data systems. In addition,
big data applications often include multiple data sets that may not be integrated upfront. For
example, a big data analytics project may attempt to forecast sales of a product by correlating
data on past sales, returns, online reviews and customer service calls.
Velocity refers to the speed at which data is generated and must be processed and analyzed. In
many cases, sets of big data are updated on a real- or near-real-time basis, instead of the daily,
weekly or monthly updates made in many traditional data warehouses. Managing data velocity is
also important as big data analysis further expands into machine learning and artificial
intelligence (AI), where analytical processes automatically find patterns in data and use them to
generate insights.
Looking beyond the original three V's, here are details on some of the other ones that are now
often associated with big data:
Veracity refers to the degree of accuracy in data sets and how trustworthy they are. Raw data
collected from various sources can cause data quality issues that may be difficult to pinpoint.
If they aren't fixed through data cleansing processes, bad data leads to analysis errors that can
undermine the value of business analytics initiatives. Data management and analytics teams
also need to ensure that they have enough accurate data available to produce valid results.
Some data scientists and consultants also add value to the list of big data's characteristics.
Not all the data that's collected has real business value or benefits. As a result, organizations
need to confirm that data relates to relevant business issues before it's used in big data
analytics projects.
Variability also often applies to sets of big data, which may have multiple meanings or be
formatted differently in separate data sources -- factors that further complicate big data
management and analytics.
Some people ascribe even more V's to big data; various lists have been created with between
seven and 10.
How is big data stored and processed?
Big data is often stored in a data lake. While data warehouses are commonly built on relational
databases and contain structured data only, data lakes can support various data types and
typically are based on Hadoop clusters, cloud object storage services, NoSQL databases or other
big data platforms.
Many big data environments combine multiple systems in a distributed architecture; for example,
a central data lake might be integrated with other platforms, including relational databases or a
data warehouse. The data in big data systems may be left in its raw form and then filtered and
organized as needed for particular analytics uses. In other cases, it's preprocessed using data
mining tools and data preparation software so it's ready for applications that are run regularly.
Big data processing places heavy demands on the underlying compute infrastructure. The
required computing power often is provided by clustered systems that distribute processing
workloads across hundreds or thousands of commodity servers, using technologies like Hadoop
and the Spark processing engine.
Getting that kind of processing capacity in a cost-effective way is a challenge. As a result, the
cloud is a popular location for big data systems. Organizations can deploy their own cloud-based
systems or use managed big-data-as-a-service offerings from cloud providers. Cloud users can
scale up the required number of servers just long enough to complete big data analytics projects.
The business only pays for the storage and compute time it uses, and the cloud instances can be
turned off until they're needed again.
Structured Data
Structured data can be crudely defined as the data that resides in a fixed field within a
record.
It is type of data most familiar to our everyday lives. for ex: birthday,address
A certain schema binds it, so all the data has the same set of properties. Structured data is
also called relational data. It is split into multiple tables to enhance the integrity of the data
by creating a single record to depict an entity. Relationships are enforced by the application
of table constraints.
The business value of structured data lies within how well an organization can utilize its
existing systems and processes for analysis purposes.
Semi-Structured Data
Semi-structured data is not bound by any rigid schema for data storage and handling. The
data is not in the relational format and is not neatly organized into rows and columns like
that in a spreadsheet. However, there are some features like key-value pairs that help in
discerning the different entities from each other.
Since semi-structured data doesn’t need a structured query language, it is commonly
called NoSQL data.
A data serialization language is used to exchange semi-structured data across systems that
may even have varied underlying infrastructure.
Semi-structured content is often used to store metadata about a business process but it can
also include files containing machine instructions for computer programs.
This type of information typically comes from external sources such as social media
platforms or other web-based data feeds.
2. Unstructured Data
Unstructured data is the kind of data that doesn’t adhere to any definite schema or set of
rules. Its arrangement is unplanned and haphazard.
Photos, videos, text documents, and log files can be generally considered unstructured data.
Even though the metadata accompanying an image or a video may be semi-structured, the
actual data being dealt with is unstructured.
Additionally, Unstructured data is also known as “dark data” because it cannot be analyzed
without the proper software tools.
1. Images, video and audio media content: The media and entertainment industry,
surveillance systems, professional publishers and even individuals are constantly creating
image, video and audio content. These media files are often stored in structured
databases, but such databases do not process or understand the actual contents of the
media files, which are in the form of unstructured data.
The ability to interpret and understand media, often in real-time, has far-reaching
implications for governance, business, and healthcare. Some examples are: an analysis of
911 call records could aid criminal investigations, CCTV camera footage could help to
prevent or detect incidents, identifying the persons in a video could be useful in news
reporting, and videos of shoppers could help retailers to understand their movements and
shopping patterns.
Considering the huge volume of data involved, analyzing the content of media files
manually is a daunting task, which is why automation solutions are currently being
developed. For example, natural-language processing can extract text out of audio files
using speech-to-text technology, and the text can be analyzed to perform sentiment
analysis. Metatags are also helpful to classify media files and perform search operations.
Customer sales or service calls can be stored, categorized, transcribed and analyzed to
find meaning. A speech recognition program converts voice to text, and emotion
detection capabilities observe the tone during the call through changes in the customers’
speed, pitch, and volume. Natural language processing helps to identify key themes,
products, and sentiments, equipping the organization to improve the customer experience,
retain customers and enhance sales.
An increasing number of websites and apps are offering visitors a live-chat functionality.
Chat conversation transcribes are a treasure trove of market intelligence if analyzed
correctly. This is where data visualization tools can play a role in helping discover key
themes. Chat data gathered over time helps to understand trends — i.e. whether a topic is
becoming hotter or cooler by the day. This knowledge can go a long way in building
deeper relationships with customers.
What is unstructured data?
Unstructured data is often categorized as qualitative and cannot be processed and analyzed using
conventional data tools and methods. It is also known as "schema independent" or "schema on
read" data.
Examples of unstructured data include text, video files, audio files, mobile activity, social
media posts, satellite imagery, surveillance imagery – the list goes on and on.
Unstructured data is difficult to deconstruct because it has no predefined data model, meaning it
cannot be organized in relational databases. Instead, non-relational or NoSQL databases are the
best fit for managing unstructured data.
Another way to manage unstructured data is to have it flow into a data lake or pool, allowing it
to be in its raw, unstructured format.
Finding the insight buried within unstructured data isn’t an easy task. It requires advanced
analytics and high technical expertise to make a difference. Data analysis can be an expensive
shift for many companies.
More examples of unstructured data:
Unstructured data is any event or alert sent and received by any user within an organization with
no proper file formatting or direct business co-dependency.
Unstructured data, also known as big data nowadays, is free-flowing and native to each specific
company. It is schema independent and is known as "schema on read." Customizing this data per
your business strategies can give you a competitive edge over competitors still stuck in
traditional decision-making. And here is why.
Unstructured data is easily available and has enough insights businesses can collect to
learn about their product response.
Unstructured data is schema-independent. Hence minor alterations to the database do not
impact cost, time, or resources.
Unstructured data can be stored on shared or hybrid cloud servers with minimal
expenditure on database management.
Unstructured data is in its native format, so data scientists or engineers do not define it
until needed. It opens the expandability of file formats, as it is available in different
formats like .mp3, .opus, .pdf, .png, and so on.
Data lakes come with "pay-as-you-use" pricing, which helps businesses cut their costs
and resource consumption.
Challenges of unstructured data
Unstructured data is the most trending method of data collection and manipulation today. Many
businesses are switching to more "customer-centric" business models and banking on consumer
data. However, working on unstructured data results in the following challenges.
Unstructured data is not the easiest to understand. Users require a proficient background
in data science and machine learning to prepare, analyze and integrate it with machine
learning algorithms.
Unstructured data rests on less authentic and encrypted shared servers, which are more
prone to ransomware and cyber attacks.
Currently, there aren't many tools that can manipulate unstructured data apart from cloud
commodity servers and open-source NoSQL DBMS.
Analyzing big data is crucial to generating more revenue and providing personalized experiences
in this digitally-driven industry. Here are a few ways big data is being applied in media and
entertainment today:
Companies like Hulu and Netflix work with an abundance of big data daily to analyze
user tendencies, preferred content, trends in consumption, and much more. As a matter of
fact, Netflix used predictive data analysis to craft its show House of Cards since the
data validated that it’d be a hit with consumers.
Ever wonder why so many streaming services are coming out? That’s because big data is
unveiling new ways to monetize digital content, creating new revenue sources for media
and entertainment companies.
Ads are targeted more strategically thanks to big data analytics software, helping
companies understand the performance of ads more clearly based on certain types of
consumers.
Finance
Big data has fundamentally changed the finance industry, particularly stock trading. The
introduction of quantitative analysis models has marked a shift from manual trading to trading
backed by technology.
The first adopters of this technology were large financial institutions and hedge funds. Now,
quantitative models have become the standard.
These models analyze big data to predict outcomes of certain events in the financial world, make
accurate enter/exit trade decisions, minimize risk using machine learning, and even gauge
market sentiment using opinion mining.
Healthcare
The ability to improve quality of life, provide hyper-personalized patient treatment, and discover
medical breakthroughs makes the healthcare industry a perfect candidate for big data. As a
matter of fact, the healthcare industry is one of the largest recent adopters of big data analytics.
In healthcare, it’s not about increasing profits or finding new product opportunities, it’s about
analyzing and applying big data in a patient-centric way. There are already many great examples
of this today:
Education
Modern learning supported by technology is moving away from what we “think” works and
more toward what we “know” works. Through big data, educators are able to craft more
personalized learning models instead of relying on standardized, one-size-fits-all frameworks.
Big data is helping schools understand the unique needs of students by blending traditional
learning environments with online environments. This allows educators to track the progress of
their students and identify gaps in the learning process.
As a matter of fact, big data is already being used on some college campuses to reduce dropout
rates by identifying risk factors in students who are falling behind in their classes.
Retail
The retail industry has gone digital, and customers expect a seamless experience from online to
brick and mortar. Big data analytics allows retail companies to provide a variety of services and
understand more about their customers.
You’ll find that some of the use cases of big data in retail closely mimic those of media and
entertainment. But in retail, it’s a bit more focused on the full customer lifecycle.
Amazon has set the golden standard when it comes to applying big data for product
recommendations based on past searches on its platform. Using predictive analytics,
Amazon and other retailers are able to accurately predict what you’re likely to purchase
next.
Demand forecasting is another application of big data. For example, retailers like
Walmart and Walgreens regularly analyze changes in weather to see any patterns in
product demand.
Big data is useful for crisis control. For example, in product recalls, big data helps
retailers identify who purchased the product and allows them to reach out accordingly.
Manufacturing
Supply chain management and big data go hand-in-hand, which is why manufacturing is one of
the top industries to benefit from the use of big data.
Monitoring the performance of production sites is more efficient with big data analytics. The use
of analytics is also extremely useful for quality control, especially in large-scale manufacturing
projects.
Big data analytics plays a key role in tracking and managing overhead and logistics across
multiple sites. For example, being able to accurately measure the cost of shop floor tasks can
help reduce labor costs.
Then there’s predictive analytics software, which uses big data from sensors attached to
manufacturing equipment. Early detection of equipment malfunctions can save sites from costly
repairs capable of paralyzing production.
4. Web Analytics
Web analytics is the collection, reporting, and analysis of website data. The focus is on
identifying measures based on your organizational and user goals and using the website data to
determine the success or failure of those goals and to drive strategy and improve the user’s
experience.
Measuring Content
Critical to developing relevant and effective web analysis is creating objectives and calls-to-
action from your organizational and site visitors goals, and identifying key performance
indicators (KPIs) to measure the success or failures for those objectives and calls-to-action. Here
are some examples for building a measurement framework for an informational website:
The process of web analytics involves:
Setting business goals: Defining the key metrics that will determine the success of your
business and website
Collecting data: Gathering information, statistics, and data on website visitors using
analytics tools
Processing data: Converting the raw data you’ve gathered into meaningful ratios, KPIs,
and other information that tell a story
Reporting data: Displaying the processed data in an easy-to-read format
Developing an online strategy: Creating a plan to optimize the website experience to
meet business goals
Experimenting: Doing A/B tests to determine the best way to optimize website
performance
Your company’s website is probably the first place your users end up on to learn more
about your product. In fact, your website is also a product. That’s why the data you
collect on your website visitors can tell you a lot about them and their website and
product expectations.
Web analytics tools reveal key details about your site visitors—including their average
time spent on page and whether they’re a new or returning user—and which content
draws in the most traffic. With this information, you’ll learn more about what parts of
your website and product interest users and potential customers the most.
For instance, an analytics tool might show you that a majority of your website visitors are
landing on your German site. You could use this information to ensure you have a
German version of your product that’s well translated to meet the needs of these users.
Conversions could mean real purchases, signing up for your newsletter, or filling out a
contact form on your website. Web analytics can give you information about the total
number of these conversions, how much you earned from the conversions, the percentage
of conversions (number of conversions divided by the number of website sessions), and
the abandonment rate. You can also see the “conversion path,” which shows you how
your users moved through your site before they converted.
By looking at the above data, you can do conversion rate optimization (CRO). CRO will
help you design your website to achieve the optimum quantity and quality of conversions.
Web analytics tools can also show you important metrics that help you boost purchases
on your site. Some tools offer an enhanced ecommerce tracking feature to help you figure
out which are the top-selling products on your website. Once you know this, you can
refine your focus on your top-sellers and boost your product sales.
By connecting your web analytics tool with Google Search Console, it’s possible to track
which search queries are generating the most traffic for your site. With this data, you’ll
know what type of content to create to answer those queries and boost your site’s search
rankings.
It’s also possible to set up onsite search tracking to know what users are searching for on
your site. This search data can further help you generate content ideas for your site,
especially if you have a blog.
Web analytics tools will also help you learn which content is performing the best on your
site, so you can focus on the types of content that work and also use that information to
make product improvements. For instance, you may notice blog articles that talk about
design are the most popular on your website. This might signal that your users care about
the design feature of your product (if you offer design as a product feature), so you can
invest more resources into the design feature. The popular content pieces on your website
could spark ideas for new product features, too.
Web analytics will tell you who your top referral sources are, so you know which
channels to focus on. If you’re getting 80% of your traffic from Instagram, your
company’s marketers will know that they should invest in ads on that platform.
Web analytics also shows you which outbound links on your site people are clicking on.
Your company’s marketing team might discover a mutually beneficial relationship with
these external websites, so you can reach out to them to explore partnership or cross-
referral opportunities.
Website performance metrics vary from company to company based on their goals for
their site. Here are some example KPIs that businesses should consider tracking as a part
of their web analytics practice.
Keep in mind traffic is a relative success metric. If you’re seeing 200 visits a month to a
blog post, that might not seem like great traffic. But if those 200 visits represent high-
intent views—views from prospects considering purchasing your product—that traffic
could make the blog post much more valuable than a high-volume, low-intent piece.
Source of traffic
Web analytics tools allow you to easily monitor your traffic sources and adjust your
marketing strategy accordingly. For example, if you’re seeing lots of traffic from email
campaigns, you can send out more email campaigns to boost traffic.
Total website conversion rate
Total website conversion rate refers to the percentage of people who complete a critically
important action or goal on your website. A conversion could be a purchase or when
someone signs up for your email list, depending on what you define as a conversion for
your website.
Bounce rate
Bounce rate refers to how many people visit just one page on your website and then leave
your site.
Interpreting bounce rates is an art. A high bounce rate could be both negative and positive
for your business. It’s a negative sign since it shows people are not interacting with other
pages on your site, which might signal low engagement among your site visitors. On the
other hand, if they spend quality time on a single page, it might indicate that users are
getting all the information they need, which could be a positive sign. That’s why you
need to investigate bounce rates further to understand what they might mean.
Repeat visit rate tells you how many people are visiting your website regularly or
repeatedly. This is your core audience since it consists of the website visitors you’ve
managed to retain. Usually, a repeat visit rate of 30% is good. Anything below 20%
shows your website is not engaging enough.
Monthly unique visitors refers to the number of visitors who visit your site for the first
time each month.
This metric shows how effective your site is at attracting new visitors each month, which
is important for your growth. Ideally, a healthy website will show a steady flow of new
visitors to the site.
Along with tracking these basic metrics, an ecommerce company’s team might also track
additional KPIs to understand how to boost sales:
Shopping cart abandonment rate shows how many people leave their shopping carts
without actually making a purchase. This number should be as low as possible.
Other relevant ecommerce metrics include average order value and the average number
of products per sale. You need to boost these metrics if you want to increase sales.
Web analytics tools
There is a whole range of tools you can use for web analytics, including tools that traditionally
specialize in product analytics or experience analytics. Some of these include:
Adobe Analytics
Amplitude
Contentsquare
Crazy Egg
fullstory
Glassbox
Google Analytics
Heap
Hotjar
Mixpanel
Pendo
Don’t just take our word for it, though. Check out review sites like G2 for a roundup of
the best web analytics tools.
As a general rule, only measure the metrics that are important to your business goals, and
ignore the rest. For example, if your primary goal is to increase sales in a certain location,
you don’t need metrics about anything outside of that location.
Your web analytics tool may also be using incorrect data filters, which may skew the
information it collects, making the data inaccurate and unreliable. And there’s not much
you can do with unreliable data.
Website data is particularly sensitive. Make sure your web analytics tools have proper
monitoring procedures and security testing in place. Take steps to protect your website
against any potential threats.
YouTube also shows recommend video based on user’s previous liked, watched video
type. Based on the content of a video, the user is watching, relevant advertisement is
shown during video running. As an example suppose someone watching a tutorial video
of Big data, then advertisement of some other big data course will be shown during that
video.
5.3 Smart Traffic System: Data about the condition of the traffic of different road,
collected through camera kept beside the road, at entry and exit point of the city, GPS
device placed in the vehicle (Ola, Uber cab, etc.). All such data are analyzed and jam-
free or less jam way, less time taking ways are recommended. Such a way smart
traffic system can be built in the city by Big data analysis. One more profit is fuel
consumption can be reduced.
5.4. Secure Air Traffic System: At various places of flight (like propeller etc) sensors
present. These sensors capture data like the speed of flight, moisture, temperature, other
environmental condition. Based on such data analysis, an environmental parameter within
flight are set up and varied. By analyzing flight’s machine-generated data, it can be
estimated how long the machine can operate flawlessly when it to be replaced/repaired.
5.5 Auto Driving Car: Big data analysis helps drive a car without human interpretation. In the
various spot of car camera, a sensor placed, that gather data like the size of the surrounding
car, obstacle, distance from those, etc. These data are being analyzed, then various
calculation like how many angles to rotate, what should be speed, when to stop, etc carried
out. These calculations help to take action automatically.
5.6 Virtual Personal Assistant Tool: Big data analysis helps virtual personal assistant tool (like
Siri in Apple Device, Cortana in Windows, Google Assistant in Android) to provide the
answer of the various question asked by users. This tool tracks the location of the user, their
local time, season, other data related to question asked, etc. Analyzing all such data, it
provides an answer.
As an example, suppose one user asks “Do I need to take Umbrella?”, the tool collects data
like location of the user, season and weather condition at that location, then analyze these
data to conclude if there is a chance of raining, then provide the answer.
5.7. IoT:
Manufacturing company install IOT sensor into machines to collect operational data.
Analyzing such data, it can be predicted how long machine will work without any problem
when it requires repairing so that company can take action before the situation when machine
facing a lot of issues or gets totally down. Thus, the cost to replace the whole machine can be
saved.In the Healthcare field, Big data is providing a significant contribution. Using big data
tool, data regarding patient experience is collected and is used by doctors to give better
treatment. IoT device can sense a symptom of probable coming disease in the human body
and prevent it from giving advance treatment. IoT Sensor placed near-patient, new-born baby
constantly keeps track of various health condition like heart bit rate, blood presser, etc.
Whenever any parameter crosses the safe limit, an alarm sent to a doctor, so that they can
take step remotely very soon.
5.8 Education Sector: Online educational course conducting organization utilize big data to
search candidate, interested in that course. If someone searches for YouTube tutorial video
on a subject, then online or offline course provider organization on that subject send ad
online to that person about their course.
5.9. Energy Sector: Smart electric meter read consumed power every 15 minutes and sends this
read data to the server, where data analyzed and it can be estimated what is the time in a day
when the power load is less throughout the city. By this system manufacturing unit or
housekeeper are suggested the time when they should drive their heavy machine in the night time
when power load less to enjoy less electricity bill.
5.10. Media and Entertainment Sector: Media and entertainment service providing company like
Netflix, Amazon Prime, Spotify do analysis on data collected from their users. Data like what
type of video, music users are watching, listening most, how long users are spending on site, etc
are collected and analyzed to set the next business strategy.
1. TinyML
It is also considered as modern ML these days. AutoML is being used to reduce human
interaction and process all the tasks automatically to solve real-life problems. This
functionality includes the whole process right from raw data to a final ML model. The motive
of AutoML is to offer extensive learning techniques and models for non-experts in ML. Not to
forget, although AutoML does not require human interaction that doesn’t mean that it’s going
to completely overtake it.
3. Data Fabric
Data Fabric has been in trend for a while now and will continue its dominance in the coming
times. It’s an architecture and group of data services throughout the cloud environment. Not
only this but data fabric has been also listed as the best analytical tool by Gartner. However, it
has to continue spreading all over the enterprise scale. It consists of key data management
technologies which include data pipelining, data integration, data governance, etc. It has been
accepted by enterprise scales openly as it consumes less time for fetching out business insights
which can be helpful for making impactful business decisions.
4. Cloud Migration
In today’s world of technology, businesses are now shifting towards cloud technology.
However, cloud migration has been in trend for a while now and this is the next future in
technology. Moving towards the cloud has several benefits and not only businesses but even
“we” as an individual are also relying totally on cloud technology. Cloud migration is very
much helpful in terms of performance as it uplifts the performance, speed, and scalability of
any operation, especially during heavy traffic.
5. Data Regulation
Since industries have started changing their working patterns and measuring business
decisions, it’s now making it easy for them to manage their operations. However, big data is
yet to make some more impact on the legal industry. In fact, some have started adopting big
data structures but it’s a long way to go. This comes up with a lot of responsibility of handing
data on such a large scale and some specific industries such as healthcare, legal fields cannot
be compromised or let’s say if there’s any patient data, it cannot be left with AI methods only.
So, as far as we’re concerned a better data regulations are going to play a major role in 2022.
6. IoT
7. NLP
Natural Language Processing is a kind of AI that helps in assessing text or voice inputs
provided by humans. In short, it is being used nowadays to understand what’s being said and
works like a charm. It is a next-level achievement in technology where we’ve been working
now and even you can find some of the examples where you can ask a machine to read aloud
for you. NLP uses a method of methodologies to extract the vagueness in speech and to
provide it a natural touch. Your very best example can be Apple’s Siri or Google Assistant,
where you speak to the AI and it provides you the useful information as per your need.
8. Data Quality
Data quality has one of the most sought concerns for companies later in 2021. In fact, the ratio
is less where companies have accepted that data quality is becoming an issue for them. Well,
on the other hand, it’s not a concern for them. To date, companies have not been focusing on
the quality of data from various mining tools which resulted in poor data management. The
reason is, if ‘Data’ is their decision-maker and playing a crucial role then they might be setting
wrong targets for their business or might be targeting the wrong group. That’s where filtration
is required to achieve real milestones.
9. Cyber Security
With the rise of pandemic (COVID-19), where the world was forced to shut down and
companies were left with none other than WFH, things began changing. Even after so many
months and years, people are focusing on getting remote work. Everything has pros and cons
in its own way. This also comes with a lot of challenges which include cyber-attacks. In fact,
working remotely comes with a lot of safety measures and responsibilities. Since the employee
is out of cyber security range and thus it becomes a concern for companies. As people are
working remotely, cyber attackers are becoming more active to breach out by finding different
ways of attack.
Taking this into considerations, XDR (Extended Detection and Response) and SOAR have
been introduced which helps in detecting any cyber-attack by applying advanced security
analytics into their network. Therefore, it is and will be one of the major trends for 2022 in big
data and analytics.
It helps in identifying any future trends and forecasts with the help of certain sets of statistics
tools. Predictive analytics analyses a pattern in a meaningful way and it is being used for
weather forecasts. However, its ability and techniques are not just limited to this, in fact, it can
be used in sorting any data, and based on the pattern, it analyses the stats.
Some of the examples are Share market, Product Research, etc. Based on the provided data, it
measures and provide a full report beforehand if any market share is dipping down or if you
want to launch any product then it collects data from different regions and based on their
interests, it will help you in analyzing your business decision and in the world of this heavy
competition, it’s becoming even more demanding and will be in trend for the upcoming years.
Among the larger concepts of rage in technology, big data technologies are widely associated
with many other technologies such as deep learning, machine learning, artificial intelligence
(AI), and Internet of Things (IoT) that are massively augmented. In combination with these
technologies, big data technologies are focused on analyzing and handling large amounts of real-
time data and batch-related data.
Before we start with the list of big data technologies, let us first discuss this technology's board
classification. Big Data technology is primarily classified into the following two types:
This type of big data technology mainly includes the basic day-to-day data that people used to
process. Typically, the operational-big data includes daily basis data such as online transactions,
social media platforms, and the data from any particular organization or a firm, which is usually
needed for analysis using the software based on big data technologies. The data can also be
referred to as raw data used as the input for several Analytical Big Data Technologies.
Some specific examples that include the Operational Big Data Technologies can be listed as
below:
o Online ticket booking system, e.g., buses, trains, flights, and movies, etc.
o Online trading or shopping from e-commerce websites like Amazon, Flipkart, Walmart,
etc.
o Online data on social media sites, such as Facebook, Instagram, Whatsapp, etc.
o The employees' data or executives' particulars in multinational companies.
Analytical Big Data Technologies
Analytical Big Data is commonly referred to as an improved version of Big Data Technologies.
This type of big data technology is a bit complicated when compared with operational-big data.
Analytical big data is mainly used when performance criteria are in use, and important real-time
business decisions are made based on reports created by analyzing operational-real data. This
means that the actual investigation of big data that is important for business decisions falls under
this type of big data technology.
Some common examples that involve the Analytical Big Data Technologies can be listed as
below:
We can categorize the leading big data technologies into the following four sections:
o Data Storage
o Data Mining
o Data Analytics
o Data Visualization
Data Storage
Let us first discuss leading Big Data Technologies that come under Data Storage:
o Hadoop: When it comes to handling big data, Hadoop is one of the leading technologies
that come into play. This technology is based entirely on map-reduce architecture and is
mainly used to process batch information. Also, it is capable enough to process tasks in
batches. The Hadoop framework was mainly introduced to store and process data in a
distributed data processing environment parallel to commodity hardware and a basic
programming execution model.
Apart from this, Hadoop is also best suited for storing and analyzing the data from
various machines with a faster speed and low cost. That is why Hadoop is known as one
of the core components of big data technologies. The Apache Software
Foundation introduced it in Dec 2011. Hadoop is written in Java programming language.
o MongoDB: MongoDB is another important component of big data technologies in terms
of storage. No relational properties and RDBMS properties apply to MongoDb because it
is a NoSQL database. This is not the same as traditional RDBMS databases that use
structured query languages. Instead, MongoDB uses schema documents.
The structure of the data storage in MongoDB is also different from traditional RDBMS
databases. This enables MongoDB to hold massive amounts of data. It is based on a
simple cross-platform document-oriented design. The database in MongoDB uses
documents similar to JSON with the schema. This ultimately helps operational data
storage options, which can be seen in most financial organizations. As a result,
MongoDB is replacing traditional mainframes and offering the flexibility to handle a
wide range of high-volume data-types in distributed architectures.
MongoDB Inc. introduced MongoDB in Feb 2009. It is written with a combination of
C++, Python, JavaScript, and Go language.
o RainStor: RainStor is a popular database management system designed to manage and
analyze organizations' Big Data requirements. It uses deduplication strategies that help
manage storing and handling vast amounts of data for reference.
RainStor was designed in 2004 by a RainStor Software Company. It operates just like
SQL. Companies such as Barclays and Credit Suisse are using RainStor for their big data
needs.
o Hunk: Hunk is mainly helpful when data needs to be accessed in remote Hadoop clusters
using virtual indexes. This helps us to use the spunk search processing language to
analyze data. Also, Hunk allows us to report and visualize vast amounts of data from
Hadoop and NoSQL data sources.
Hunk was introduced in 2013 by Splunk Inc. It is based on the Java programming
language.
o Cassandra: Cassandra is one of the leading big data technologies among the list of top
NoSQL databases. It is open-source, distributed and has extensive column storage
options. It is freely available and provides high availability without fail. This ultimately
helps in the process of handling data efficiently on large commodity groups. Cassandra's
essential features include fault-tolerant mechanisms, scalability, MapReduce support,
distributed nature, eventual consistency, query language property, tunable consistency,
and multi-datacenter replication, etc.
Cassandra was developed in 2008 by the Apache Software Foundation for the
Facebook inbox search feature. It is based on the Java programming language.
Data Mining
Let us now discuss leading Big Data Technologies that come under Data Mining:
o Presto: Presto is an open-source and a distributed SQL query engine developed to run
interactive analytical queries against huge-sized data sources. The size of data sources
can vary from gigabytes to petabytes. Presto helps in querying the data in Cassandra,
Hive, relational databases and proprietary data storage systems.
Presto is a Java-based query engine that was developed in 2013 by the Apache Software
Foundation. Companies like Repro, Netflix, Airbnb, Facebook and Checkr are using this
big data technology and making good use of it.
o RapidMiner: RapidMiner is defined as the data science software that offers us a very
robust and powerful graphical user interface to create, deliver, manage, and maintain
predictive analytics. Using RapidMiner, we can create advanced workflows and scripting
support in a variety of programming languages.
RapidMiner is a Java-based centralized solution developed in 2001 by Ralf
Klinkenberg, Ingo Mierswa, and Simon Fischer at the Technical University of
Dortmund's AI unit. It was initially named YALE (Yet Another Learning Environment).
A few sets of companies that are making good use of the RapidMiner tool are Boston
Consulting Group, InFocus, Domino's, Slalom, and Vivint.SmartHome.
o ElasticSearch: When it comes to finding information, elasticsearch is known as an
essential tool. It typically combines the main components of the ELK stack (i.e., Logstash
and Kibana). In simple words, ElasticSearch is a search engine based on the Lucene
library and works similarly to Solr. Also, it provides a purely distributed, multi-tenant
capable search engine. This search engine is completely text-based and contains schema-
free JSON documents with an HTTP web interface.
ElasticSearch is primarily written in a Java programming language and was developed in
2010 by Shay Banon. Now, it has been handled by Elastic NV since 2012. ElasticSearch
is used by many top companies, such as LinkedIn, Netflix, Facebook, Google, Accenture,
StackOverflow, etc.
Data Analytics
Now, let us discuss leading Big Data Technologies that come under Data Analytics:
o Apache Kafka: Apache Kafka is a popular streaming platform. This streaming platform
is primarily known for its three core capabilities: publisher, subscriber and consumer. It is
referred to as a distributed streaming platform. It is also defined as a direct messaging,
asynchronous messaging broker system that can ingest and perform data processing on
real-time streaming data. This platform is almost similar to an enterprise messaging
system or messaging queue.
Besides, Kafka also provides a retention period, and data can be transmitted through a
producer-consumer mechanism. Kafka has received many enhancements to date and
includes some additional levels or properties, such as schema, Ktables, KSql, registry,
etc. It is written in Java language and was developed by the Apache software
community in 2011. Some top companies using the Apache Kafka platform include
Twitter, Spotify, Netflix, Yahoo, LinkedIn etc.
o Splunk: Splunk is known as one of the popular software platforms for capturing,
correlating, and indexing real-time streaming data in searchable repositories. Splunk can
also produce graphs, alerts, summarized reports, data visualizations, and dashboards, etc.,
using related data. It is mainly beneficial for generating business insights and web
analytics. Besides, Splunk is also used for security purposes, compliance, application
management and control.
Splunk Inc. introduced Splunk in the year 2014. It is written in combination with AJAX,
Python, C ++ and XML. Companies such as Trustwave, QRadar, and 1Labs are making
good use of Splunk for their analytical and security needs.
o KNIME: KNIME is used to draw visual data flows, execute specific steps and analyze
the obtained models, results, and interactive views. It also allows us to execute all the
analysis steps altogether. It consists of an extension mechanism that can add more
plugins, giving additional features and functionalities.
KNIME is based on Eclipse and written in a Java programming language. It was
developed in 2008 by KNIME Company. A list of companies that are making use of
KNIME includes Harnham, Tyler, and Paloalto.
o Spark: Apache Spark is one of the core technologies in the list of big data technologies.
It is one of those essential technologies which are widely used by top companies. Spark is
known for offering In-memory computing capabilities that help enhance the overall speed
of the operational process. It also provides a generalized execution model to support more
applications. Besides, it includes top-level APIs (e.g., Java, Scala, and Python) to ease the
development process.
Also, Spark allows users to process and handle real-time streaming data using batching
and windowing operations techniques. This ultimately helps to generate datasets and data
frames on top of RDDs. As a result, the integral components of Spark Core are produced.
Components like Spark MlLib, GraphX, and R help analyze and process machine
learning and data science. Spark is written using Java, Scala, Python and R language.
The Apache Software Foundation developed it in 2009. Companies like Amazon,
ORACLE, CISCO, VerizonWireless, and Hortonworks are using this big data technology
and making good use of it.
o R-Language: R is defined as the programming language, mainly used in statistical
computing and graphics. It is a free software environment used by leading data miners,
practitioners and statisticians. Language is primarily beneficial in the development of
statistical-based software and data analytics.
R-language was introduced in Feb 2000 by R-Foundation. It is written in Fortran.
Companies like Barclays, American Express, and Bank of America use R-Language for
their data analytics needs.
o Blockchain: Blockchain is a technology that can be used in several applications related
to different industries, such as finance, supply chain, manufacturing, etc. It is primarily
used in processing operations like payments and escrow. This helps in reducing the risks
of fraud. Besides, it enhances the transaction's overall processing speed, increases
financial privacy, and internationalize the markets. Additionally, it is also used to fulfill
the needs of shared ledger, smart contract, privacy, and consensus in any Business
Network Environment.
Blockchain technology was first introduced in 1991 by two researchers, Stuart
Haber and W. Scott Stornetta. However, blockchain has its first real-world application
in Jan 2009 when Bitcoin was launched. It is a specific type of database based on Python,
C++, and JavaScript. ORACLE, Facebook, and MetLife are a few of those top companies
using Blockchain technology.
Data Visualization
Let us discuss leading Big Data Technologies that come under Data Visualization:
o Tableau: Tableau is one of the fastest and most powerful data visualization tools used by
leading business intelligence industries. It helps in analyzing the data at a very faster
speed. Tableau helps in creating the visualizations and insights in the form of dashboards
and worksheets.
Tableau is developed and maintained by a company named TableAU. It was introduced
in May 2013. It is written using multiple languages, such as Python, C, C++, and Java.
Some of the list's top companies are Cognos, QlikQ, and ORACLE Hyperion, using this
tool.
o Plotly: As the name suggests, Plotly is best suited for plotting or creating graphs and
relevant components at a faster speed in an efficient way. It consists of several rich
libraries and APIs, such as MATLAB, Python, Julia, REST API, Arduino, R, Node.js,
etc. This helps interactive styling graphs with Jupyter notebook and Pycharm.
Plotly was introduced in 2012 by Plotlycompany. It is based on JavaScript. Paladins and
Bitbank are some of those companies that are making good use of Plotly.
Apart from the above mentioned big data technologies, there are several other emerging big data
technologies. The following are some essential technologies among them:
These are emerging technologies. However, they are not limited because the ecosystem of big
data is constantly emerging. That is why new technologies are coming at a very fast pace based
on the demand and requirements of IT industries.
8. Introduction to Hadoop
Hadoop is an Apache open source framework written in java that allows distributed processing
of large datasets across clusters of computers using simple programming models. The Hadoop
framework application works in an environment that provides
distributed storage and computation across clusters of computers. Hadoop is designed to scale up
from single server to thousands of machines, each offering local computation and storage.
Hadoop Architecture
At its core, Hadoop has two major layers namely −
Consequently, Mobile Business Intelligence (BI) refers to the ability to access and perform BI-
related data analysis on mobile devices and tablets. It makes it easier to display KPIs, business
metrics, and dashboards. Mobile BI empowers users to use and receive the same features and
capabilities as those in the desktop / app-based BI software.
Why is mobile BI important?
Businesses these days possess an abundance of data. In this fast-paced environment, everyone
needs real-time data access to make data-driven decisions anytime and anywhere.
The number of organizations using mobile apps like SaaS applications for critical
business processes is increasing every day. Whether you are a CEO, a sales person, a digital
marketer, a department manager, or another employee, mobile BI can help you increase
productivity, improve the decision-making process, and overall boost your business.
Advantages of mobile BI
Accessibility
Having company insights at your fingertips is the most valuable advantage of mobile BI. You are
not limited to one computer in one location, but instead, you can access important data
information on your mobile device at any time and from any location. Having real-time data
insights always available helps improve the overall productivity of your daily operations.
Improved decision–making
Mobile BI apps speed up the decision-making process. When decisions must be made, or when
actions must be taken on the spot and at the moment, mobile BI provides up-to-the-minute
insights based on data to help users when they need it the most.
Mobile BI solutions
Every mobile device can be used to display and work with data. But of course, there are some
significant differences – the size of the screen is obviously difference #1. For example,
some data visualizations require more screen space, while others can easily fit in the width of a
smartphone or a tablet.
There are a couple of ways to implement content on mobile devices with those being the most
common:
Web page – Every mobile device includes a web browser that can access almost any web page.
Depending on some factors, the quality of the accessed page could be terrible, acceptable, or
great. In this scenario, BI developers need to check how their content renders on mobile devices
when creating reports and other data visualizations. This means that they should design the
application specifically for mobile use.
HTML5 site – Same as the web page, but with some improvements. HTML5 enables RIA (Rich
Internet Application) content to be displayed across all types of mobile devices without relying
on proprietary standards. The advantages of HTML5 are functions such as zooming, double-
tapping, and pinching, as well as that the user doesn’t have to install the app on its device to be
able to use it.
Native app – Those are web applications that can be downloaded and installed on mobile
devices. The application’s software is tailored to the OS of the mobile devices. It is the most
difficult and most expensive solution for manufacturers to support mobile BI, but it enables
interactive and enhanced use of analytics content.
Mobile markets are forever changing their devices, operating systems, and screen sizes – so why
should you have to think about designing for so many types and always updating them?
Best Performance
With native mobile app development, the app is created and optimized for a specific platform.
The result is that the app has a higher level of performance. Native apps are very fast and
responsive because they are built for that specific platform and are compiled using a platform’s
core programming language and APIs. This also creates greater efficiency. The device stores
the app and this allows the software to leverage the device’s processing speed. So when users
navigate through a native mobile app, load times are faster because the content and visual
elements are already stored on their phone.
In contrast, a web app operates as a series of calls to and from remote web pages, and its speed is
constrained by all those Internet connections.
For example, take a look at this dashboard running native on iOS versus a dashboard running as
a web app:
Another benefit of a native app is that, because it is developed for a particular platform, it can
fully utilize the software and the operating systems’ features. The app can directly access the
hardware of the device such as the GPS, camera, microphone, etc. so they are faster in execution,
which ultimately results in a better user experience.
from a large, open-minded, and rapidly changing group of people in the form of ideas, micro-
tasks, finances, and so on. Crowdsourcing is most commonly associated with the use of the
internet to attract a large group of people to divide tasks or achieve a goal. Jeff Howe and Mark
Crowdsourcing can assist various types of organizations in obtaining new ideas and solutions,
or services (such as ideas, votes, microtasks, and finances) for payment or as volunteers. In
today's crowdsourcing, digital platforms are frequently used to attract and divide work among
Crowdsourcing, however, is not limited to online activity, and there are numerous historical
and "outsourcing."
Crowdsourcing allows companies to tap into the world of ideas and allows many to work
through a rapid design process. You can outsource to a large group of people to ensure that your
Crowdsourcing is very powerful because it allows for a wide range of participation from people
at a low or no cost. Suggestions are provided by experienced professionals and volunteers who
are compensated only if their ideas are implemented — they rely on creativity that people are
willing to share. All they require is an opportunity to participate. This is especially true when
people use the internet for crowdsourcing. Many people, for example, create and post videos on
YouTube.
There are numerous avenues for crowdsourcing, such as enlisting volunteers, blogs, hotlines,
distribution incentives, free products, and so on. Companies such as IdeaSkill and InnoCentive
specialize in delivering the crowd, allowing you to directly tap into a predefined group of people
When it comes to how crowdsourcing works, it has a very low cost. Everyone should invest in
crowdsourcing so that they can tap into a global pool of creativity. This also assists the company
in driving, motivating, mass collaboration, and innovation while remaining true to the
competition.
Step 1: Recognize "The Crowd":
Before you can effectively crowdsource, you must first identify the crowd. Most of the time, it is
not as simple as "every employee." While you may want to send out broad-based surveys (like
the Boeing initiative mentioned above), you may also want to narrow your focus for information
gathering.
To determine which topics are relevant, you may want to survey only engineers. Or just
salespeople to see how they can improve their numbers the most. When crowdsourcing content,
it's critical to limit your "crowd" to Subject Matter Experts - you don't want every Tom, Dick,
and Harriet telling you how to perform a delicate engineering operation! You will be able to do
so in many cases.
Anyone interested in crowdsourcing training should acquire or develop an internal platform that
allows for easy submission of crowdsourced content, as well as efficient content management
and ease of access for those seeking information from the crowd. Improved technology allows
for greater connectivity and efficiency within organizations. Training has enormous potential for
So, the first step is to make it simple for people to provide you with the information you require.
Once you've cleared that (not insignificant) hurdle, you'll need to determine what type of
When crowdsourcing topics, it's best to keep things simple. In the Boeing example, employees
were given four options with the option to add additional ideas. A simpler approach not only
makes it easier for employees to respond, but it also keeps everything in perspective - for both
them and you. They'll know they're not being given free rein to re-invent the company's training
department after four questions. When it comes time to prioritize topics, it also allows you to
No matter how good the information you gather is, it is always your responsibility to ensure that
your audience receives the right information in the right way. That is why, as a training
professional, crowdsourcing will never be able to replace you. YOU are the true source of
Advantages of Crowdsourcing:
As previously stated, crowdsourcing greatly aids business growth. Crowdsourcing, whether for a
business idea or a logo design, directly engages people while saving money and energy.
Crowdsourced marketing will undoubtedly gain traction in the coming years as the world
Companies can gain access to amazing suggestions for a new product or service, or for a new
solution to a difficult problem by posing a question to a diverse talent pool. This not only aids in
problem-solving but also allows groups to feel connected to businesses and organizations.
Building this contributor community can have significant marketing, brand visibility, and
1. Lower costs: While winning ideas should be rewarded, providing these rewards is
usually much less expensive than formally hiring people to solve problems.
2. Greater speed: Using a larger pool of people can accelerate the problem-solving
process, particularly when completing a large number of small tasks in real time.
3. More diversity: Some businesses (particularly smaller businesses) may lack internal
diversity. They can benefit from others with different backgrounds, values, and life
media coverage.
innovative ideas from people from various fields, thereby assisting businesses in all fields
to grow.
Crowdsourcing can be used to find solutions to a wide range of problems. This can range from
something as simple as a band asking its fans which cities they should visit on their next tour to
more ambitious projects like genetic researchers seeking assistance in sequencing the human
genome.
The breadth and diversity of social media also offer enormous potential for crowdsourcing, as
demonstrated by the Obama administration's use of Twitter to solicit questions for town hall
debates and football clubs asking fans to vote for the starting lineup ahead of each match.
Crowdsourcing can also take the form of idea competitions, such as Ideas for Action, which
provides a platform for students and young professionals to submit solutions to global innovation
challenges.
11. Inter and Trans Firewall Analytics
What is Firewall?
A firewall is a sort of network security hardware or software application that monitors and filters
incoming and outgoing network traffic according to a set of security rules. It serves as a barrier
between internal private networks and public networks (such as the public Internet).
Whether a firewall is hardware or software is one of the most difficult issues to answer. A
firewall, as previously noted, can be a network security equipment or a computer software
programme. This means that the firewall is available on both hardware and software levels, albeit
it is preferable to have both.
Need of Firewall:
Firewalls are used to protect against malware and network-based threats. They can also aid in the
prevention of application-layer assaults. These firewalls serve as a barrier or a gatekeeper. They
keep track of every connection our machine makes to another network. They only let data
packets to pass through them if the data is coming or leaving from a trusted source designated by
the user.
Firewalls are built in such a way that they can identify and counter-attack threats throughout the
network fast. They can work with regulations that have been set up to defend the network, as
well as conduct fast inspections to look for any unusual activity. In other words, the firewall can
be used as a traffic controller.
The following are some of the major dangers of not having a firewall:
If a computer is not protected by a firewall, it allows unrestricted access to other networks. This
means it accepts any type of connection that is made through another person. It is impossible to
detect threats or assaults going over our network in this circumstance. We render our devices
vulnerable to malicious users and other undesired sources if we don’t employ a firewall.
We’re leaving our gadgets open to everyone if we don’t use a firewall. This means that anyone,
even the network, can gain access to our device and have complete control over it.
Cybercriminals can easily destroy our data or utilise our personal information for their own gain
in this situation.
Network Crashes:
Anyone could gain access to our network and shut it down if we didn’t have a firewall. It may
prompt us to devote precious time and resources to restoring our network’s functionality.
As a result, it is critical to employ firewalls to protect our network, computer, and data from
unauthorized access.
A firewall system examines network traffic according to pre-set rules. The traffic is subsequently
filtered, and any traffic coming from untrustworthy or suspect sources is blocked. It only accepts
traffic that has been configured to accept. Firewalls often intercept network traffic at a
computer’s port, or entry point. According to pre-defined security criteria, firewalls allow or
block particular data packets (units of communication carried over a digital network). Only
trusted IP addresses or sources are allowed to send traffic in.
Firewall’s Functions
As previously established, the firewall acts as a gatekeeper. It examines all attempts to obtain
access to our operating system and blocks traffic from unidentified or unknown sources.
We can think of the firewall as a traffic controller since it operates as a barrier or filter between
the computer system and external networks (such as the public Internet). As a result, the major
function of a firewall is to protect our network and information by managing network traffic,
prohibiting unwanted incoming network traffic, and validating access by scanning network
traffic for dangerous things like hackers and viruses. Most operating systems (for example,
Windows OS) and security applications provide firewall capability by default. As a result, it’s a
good idea to make sure those options are enabled. We can also adjust the system’s security
settings to update automatically whenever new information becomes available.
Firewalls have grown in power, and now encompass a number of built-in functions and
capabilities:
Firewalls are the first line of defence when it comes to network security. However, the question
remains as to whether these firewalls are powerful enough to protect our gadgets from cyber-
attacks. It’s possible that the answer is “no.” When utilising the Internet, it is best to employ a
firewall system. Other defence mechanisms, on the other hand, should be used to assist safeguard
the network and data saved on the computer. Because cyber dangers are always changing, a
firewall should not be the only thing you think about when it comes to protecting your home
network.
Firewalls are obviously important as a security solution; nonetheless, they have significant
limitations:
Firewalls are unable to prevent people from accessing dangerous websites, leaving the
organisation open to internal threats and attacks.
If security rules are misconfigured, firewalls will not be able to protect you.
Non-technical security vulnerabilities, such as social engineering, are not protected by firewalls.
Firewalls cannot prevent or stop attackers using modems from calling in or out of the internal
network.
As a result, it’s a good idea to maintain all Internet-connected gadgets up to date. This includes
the most recent versions of operating systems, web browsers, apps, and other security software
(such as anti-virus). Furthermore, wireless router security should be a standard practise.
Changing the router’s name and password on a regular basis, evaluating security settings, and
setting up a guest network for visitors are all possibilities for protecting a router.
Types of Firewalls:
Different types of firewalls exist, each with its own structure and functionality. The following is
a list of some of the most prevalent firewall types: