0% found this document useful (0 votes)
97 views60 pages

System Integration

Uploaded by

Noriel Galoso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views60 pages

System Integration

Uploaded by

Noriel Galoso
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

System Integration

Topics Covered:

1. Introduction to System Integration


What is system integration?
Integration is the act of bringing together smaller components or information stored in
different subsystems into a single functioning unit.
In an IT context, integration refers to the end result of a process that aims to combine
different -- often disparate -- subsystems so that the data contained in each becomes
part of a larger, more comprehensive system that, ideally, quickly and easily shares data
when needed. This often requires organizations to build a customized architecture or
structure of applications to combine new or existing hardware, software and other
components.

E
xplanation of the four integration methods.
System integration methods
There are a variety of methods for achieving connectivity between unrelated systems.
Based on the type of usage and business requirements, there are four common
integration methods as follows:
1. Vertical integration. This strategy enables an organization to integrate unrelated
subsystems into one functional unit by creating silos based on their functionalities. Each
layer or element in vertical integration works upward and the process of integration is
expedited by using only a handful of vendors, partners and developers. Considered to
be the quickest method of integration, it can also be the riskiest, as it requires a
significant capital investment.
2. Horizontal integration. The horizontal integration method, also known as the
enterprise service bus (ESB), assigns a specialized subsystem to communicate with
other subsystems. It reduces the number of interfaces connecting directly to the ESB to
just one, decreasing integration costs and providing flexibility. It's also considered a
business expansion strategy, where one company might acquire another one from the
same business line or value chain.
3. Point-to-point integration. Also commonly known as star integration or spaghetti
integration, this method interconnects the remaining subsystems. Once connected,
these interconnected systems resemble a star polyhedron. Most companies segment
their processes during point-to-point integration. For example, a separate accounting
system could track finances; a web analytics system could manage website traffic; and a
customer resource management (CRM) system would integrate Salesforce. Depending
on the organizational needs, data from each system could be pulled and combined.
4. Common data format. The common data format helps businesses by providing data
translation and promoting automation. This method was created to avoid having an
adapter for every process of converting data to or from other formats of applications. For
integration to take place, enterprise application integration is used, which enables the
transformation of the data format of one system to be accepted by another system. A
popular example of a common data format is the conversion of zip codes to city names
by merging objects from one application or database to another.
Benefits of system integration
Integrations of all types are beneficial whether they're conducted for CRM or enterprise
resource planning (ERP).
The following are a few benefits of system integration:
 Automation and streamlining of processes. The biggest benefit of system integration
is the streamlining and aggregation of relevant and correlated data. This helps with
automating and simplifying data collection and processing of data across the different
subsystems.
 Improved accessibility and syncing of data. Due to the streamlined processes, users
don't have to wait for the data to be manually synced up across multiple systems to
access it. For example, if one subsystem makes a data change, it's automatically
updated for other systems as well. This improves the accessibility and real-time
availability of data.
 Improved data accuracy. The accuracy of data is improved. Since all data is
automatically updated and synchronized between disparate subsystems, the chances of
inaccuracy are reduced, as users no longer access outdated data.
 Increased efficiency and revenue. System integration automates the flow of
information between systems, eliminating the need for repetitive and manual data entry.
This enables users to allocate their time and resources wisely and improve their overall
efficiency. It also helps with increased revenue potential. For example, in the healthcare
industry, an integrated payment system opens up more channels for payment collection
at every patient interaction.
 Scalability. Most integrated systems rely heavily on the cloud for data storage, which
makes business scalability less of a challenge. Instead of having a separate computing
platform for each subsystem, integrated systems enable all subsystems to use a
centralized storage platform, thus eliminating the need for replication of resources. If a
business grows exponentially and additional storage is required, it can be quickly scaled
up by the cloud provider.
 Cost effective. With an integration platform, companies don't need to exhaust their
financial resources to manage individual subsystems and software applications and
accrue extra maintenance costs. Integrated systems also eliminate repetitive tasks and
eradicate the need for multiple data stores, which drastically cuts down on data storage
costs.
Challenges of system integration
System integration is an intricate process. Even with the best strategies in place and a
multitude of resources at a company's disposal, things don't always go as planned.
The following are common challenges of system integration:
 Legacy systems. Organizations that have been around for a while tend to have legacy
systems in place that are critical to their business processes and can't be replaced
easily. The challenges to achieving integration mostly have to do with the inherent
difficulties of linking a series of diverse and monolithic existing systems that could be
produced by multiple different manufacturers. Other integration challenges can include a
lack of a coherent or unifying data structure linking all of the different systems, an
unwieldy framework of multiple different applications and support systems, the sheer
age of the different systems and the actual delivery of the information to key business
units that need it. These data integration challenges hinder overall process efficiency
because poor data exchange between systems prevents quick communication among
business units.
 Rapid changes in the integration landscape. The IT and information systems
landscapes are dynamic and constantly evolving. When it comes to system integrations
with rapidly changing requirements, time is of the essence. System integration projects
that take longer than expected tend to get complex and difficult to pursue. In a dynamic
environment, the best approach is to go with an agile working strategy with short-term,
ad hoc objectives that slowly build toward full integration by linking various subsystems
where necessary.
 Lack of skilled resources. System integration requires a certain expertise, such as an
extensive understanding of the technology industry and the latest trends, as well as a
strong command of programming languages and system codes. Even if organizations
have top-notch integration technology in place, they also need integration experts who
have the knowledge and required integration skills. While most companies struggle to
find and retain employees with such skill sets, some hire external contractors that might
fall short of expertise.
 Lack of accountability. System integration involves various subsystems and multiple
stakeholders, including vendors and system owners, none of whom are responsible for
the entire integration process. Each involved party only takes care of their side of the
integration and doesn't offer accountability if something outside their territory goes
wrong. This ambiguity over ownership of different subsystems can cause serious issues
with accountability during system integrations and can cause project interruptions and
setbacks.
 Picking the right integration tool. From cloud to hybrid options, there's an
overabundance of systems integration tools available on the market. This can make it
challenging for businesses to pick the right one for their needs and requirements.
Integration and CRM systems
Companies strive to integrate their CRM systems with other components of the business
to help streamline the marketing and sales processes, organizing and updating customer
information with the hopes of deepening customer relationships and boosting revenue
and growth.
The following are some use cases and advantages of integrating CRM systems:
 The main goal of integrating a CRM system with other, smaller systems is to prevent
manual data entry and save employees time by removing redundant, unnecessary or
tedious steps. For example, a company might integrate its website with its marketing
automation software to bring customer information directly into the CRM system. Any
action a prospect takes on the website can be logged and a new record can be
automatically created in the system.
 Other uses of CRM integration include integrating email systems with a CRM and
automatically importing basic customer information -- such as name, company, email
address and phone number -- from emails into the CRM to facilitate follow-up contacts
and log a record of interactions. Integration with other CRM systems is a key component
of product development, as companies must make their products compatible with
existing products to cater to the customer and maximize their reach and applicability.
This often involves altering the new product's programming code to match the existing
product's code so that integration can take place. Integration during product
development also refers to the process in which separately produced components are
combined and problems in their interactions are addressed.
 Companies will often try to integrate their CRM system with their legacy ERP or
accounting systems to link financial information to assist customer service. Some
integration touchpoints can be handled by the default functionality within the software
packages themselves, but some configuration still needs to take place. Custom
functionality might need to be built in, depending on the business's needs and the
limitations of both systems.
2. Integration Patterns and Strategies

Data integration pattern 1: Migration

Migration is the act of moving data from one system to the other. A migration contains a
source system where the data resides at prior to execution, a criteria which determines
the scope of the data to be migrated, a transformation that the data set will go through, a
destination system where the data will be inserted, and an ability to capture the results of
the migration to know the final state vs the desired state.
Why is migration valuable?
Data migration is essential to all data systems. We spend a lot of time creating and
maintaining data, and migration is key to keep that data agnostic from the tools that we
use to create it, view it, and manage it. Without migration, we would lose all the data that
we have amassed any time that we want to change tools — crippling our ability to be
productive in the digital world.
When is migration useful?
Data migration occurs when moving from one system to another, moving f to another or
newer instance of that system, spinning up a new system that extends your current
infrastructure, backing up a dataset, adding nodes to database clusters, replacing
database hardware, consolidating systems and many more.

Data integration pattern 2: Broadcast


Broadcast can also be called “one way sync from one to many,” and it refers to moving
data from a single source system to many destination systems in an ongoing and real-
time (or near real-time), basis.
Whenever there is a need to keep our data up-to-date between multiple systems across
time, you will need either a broadcast, bi-directional sync, or correlation pattern. The
broadcast pattern, like the migration pattern, only moves data in one direction, from the
source to the destination. The broadcast pattern, unlike the migration pattern, is
transactional.
This means it does not execute the logic of the message processors for all items which
are in scope; rather, it executes the logic only for those items that have recently
changed. Think of broadcast as a sliding window that only captures those items which
have field values that have changed since the last time the broadcast ran.
Another major difference is in how the implementation of the pattern is designed.
Migration will be tuned to handle large volumes of data and process many records in
parallel and to have a graceful failure case. Broadcast patterns are optimized for
processing the records quickly and being highly reliable to avoid losing critical data in
transit.
Why is broadcast valuable?
The broadcast pattern is extremely valuable when system B needs to know some
information in near real time that originates or resides in system A. For example, you
may want to create a real-time reporting dashboard — a destination of multiple
broadcast applications that receives real-time updates on what is going across multiple
systems.
You may want to immediately start fulfilment of orders that come from your CRM,
eCommerce tool, or internal tool where the fulfilment processing system is centralized
regardless of which channel the order comes from. You may want to send a notification
of the temperature of your steam turbine to a monitoring system every 100ms. You may
want to broadcast to a general practitioner’s patient management system, when one of
their regular patients is checked into an emergency room. There are countless examples
of when you want to transfer data from an originating system and broadcast it to another.
When is broadcast useful?
The broadcast pattern’s “need” can easily be identified by the following criteria:
 Does system B need to know as soon as the event happens — Yes
 Does data need to flow from A to B automatically, without human involvement — Yes
 Does system A need to know what happens with the object in system B — No
The first question will help you decide whether you should use the migration pattern or
broadcast based on whether the data needs to be in real-time. Anything less than
approximately every hour will tend to be a broadcast pattern. However, there are always
exceptions based on volumes of data.
The second question generally rules out “on demand” applications and in general
broadcast patterns will either be initiated by a push notification or a scheduled job and
hence will not have human involvement.
The last question will let you know whether you need to union the two data sets so that
they are synchronized across two systems, which is what we call bi-directional sync.
Different needs will call for different data integration patterns, but the broadcast pattern is
much more flexible in how you can couple the applications and we would recommend
using two broadcast applications over a bi-directional sync application.

Data integration pattern 3: Bi-directional sync


The bi-directional sync data integration pattern is the act of combining two datasets in
two different systems so that they behave as one, while respecting their need to exist as
different datasets. This type of integration need comes from having different tools or
different systems for accomplishing different functions on the same dataset.
For example, you may have a system for taking and managing orders and a different
system for customer support. You may find that these two systems are best of breed and
it is important to use them rather than a suite which supports both functions and has a
shared database. Using bi-directional sync to share the dataset will enable you to use
both systems while maintaining a consistent real-time view of the data in both systems.
Why is bi-directional sync valuable?
Bi-directional sync can be both an enabler and a savior depending on the circumstances
that justify its need. If you have two or more independent and isolated representations of
the same reality, you can use bi-directional sync to optimize your processes.
On the other hand, you can use bi-directional sync to take you from a suite of products
that work well together — but may not be the best at their own individual function — to a
suite that you hand pick and integrate together using an enterprise integration
platform like MuleSoft’s Anypoint Platform.
When is bi-directional sync useful?
The need for a bi-directional sync integration application is synonymous with wanting
object representations of reality to be comprehensive and consistent. For example, if you
want a single view of your customer, you can solve that manually by giving everyone
access to all the systems that have a representation of the notion of a customer. But a
more efficient solution is to list out which fields need to be visible for that customer object
in which systems and which systems are the owners.
Most enterprise systems have a way to extend objects such that you can modify the
customer object data structure to include those fields. Then you can create integration
applications either as point-to-point applications (using a common integration platform) if
it’s a simple solution, or a more advanced routing system like a pub/sub or queue routing
model if there are multiple systems at play.
For example, a salesperson should know the status of a delivery, but they don’t need to
know at which warehouse the delivery is. Similarly, the delivery person needs to know
the name of the customer that the delivery is for without needing to know how much the
customer paid for it. Bi-directional synchronization allows both of those people to have a
real-time view of the same customer through the lens they need.

Data integration pattern 4: Correlation


The correlation data integration pattern is a design that identifies the intersection of two
data sets and does a bi-directional synchronization of that scoped dataset only if that
item occurs in both systems naturally. This is similar to how the bi-directional pattern
synchronizes the union of the scoped dataset, correlation synchronizes the intersection.
In the case of the correlation pattern, those items that reside in both systems may have
been manually created in each of those systems, like two sales representatives entering
the same contact in both CRM systems. Or they may have been brought in as part of a
different integration. The correlation pattern will not care where those objects came from;
it will agnostically synchronize them as long as they are in both systems.
Why is correlation valuable?
The correlation data integration pattern is useful when you have two groups or systems
that want to share data only if they both have a record representing the same
item/person in reality. For example, a hospital group has two hospitals in the same city.
You might like to share data between the two hospitals so if a patient uses either
hospital, you will have a up to date record of what treatment they received at both
locations.
To accomplish an integration like this, you may decide to create two broadcast pattern
integrations, one from Hospital A to Hospital B, and one from Hospital B to Hospital A.
This will ensure that the data is synchronized; however, you now have two integration
applications to manage.
To alleviate the need to manage two applications, you can just use the bi-directional
synchronization pattern between Hospital A and B. But to increase efficiency, you might
like the synchronization to not bring the records of patients of Hospital B if those patients
have no association with Hospital A and to bring it in real time as soon as the patient’s
record is created. The correlation pattern is valuable because it only bi-directionally
synchronizes the objects on a “need to know” basis rather than always moving the full
scope of the dataset in both directions.
When is correlation useful?
The correlation data integration pattern is most useful when having the extra data is
more costly than beneficial because it allows you to scope out the “unnecessary” data.
For example, if you are a university, part of a larger university system, and you are
looking to generate reports across your students.
You probably don’t want a bunch of students in those reports that never attended your
university. But you may want to include the units that those students completed at other
universities in your university system. Here, the correlation pattern would save you a lot
of effort either on the integration or the report generation side because it would allow you
to synchronize only the information for the students that attended both universities.

Data integration pattern 5: Aggregation


Aggregation is the act of taking or receiving data from multiple systems and inserting into
one. For example, customer data integration could reside in three different systems, and
a data analyst might want to generate a report which uses data from all of them. One
could create a daily migration from each of those systems to a data repository and then
query that against that database. But then there would be another database to keep
track of and keep synchronized.
In addition, as things change in the three other systems, the data repository would have
to be constantly kept up to date. Another downside is that the data would be a day old,
so for real-time reports, the analyst would have to either initiate the migrations manually
or wait another day. One could set up three broadcast applications, achieving a situation
where the reporting database is always up to date with the most recent changes in each
of the systems. But there would still be a need to maintain this database which only
stores replicated data so that it can be queried every so often. In addition, there will be a
number of wasted API calls to ensure that the database is always up to X minutes from
reality.
This is where the aggregation pattern comes into play. If you build an application, or use
one of our templates that is built on it, you will notice that you can on demand query
multiple systems, merge the data set, and do as you please with it.
For example, you can build an integration app which queries the various systems,
merges the data and then produces a report. This way you avoid having a separate
database and you can have the report arrive in a format like .csv or the format of your
choice. You can place the report in the location where reports are stored directly.
Why is aggregation valuable?
The aggregation pattern derives its value from allowing you to extract and process data
from multiple systems in one united application. This means that the data is up to date at
the time that you need it, does not get replicated, and can be processed or merged to
produce the dataset you want.
When is aggregation useful?
The aggregation pattern is valuable if you are creating orchestration APIs to
“modernize” legacy systems, especially when you are creating an API which gets data
from multiple systems, and then processes it into one response. Another use case is
creating reports or dashboards that pull data from multiple systems and create an
experience with that data.
Finally, you may have systems that you use for compliance or auditing purposes which
need to have related data from multiple systems. The aggregation pattern is helpful in
ensuring that your compliance data lives in one system but can be the amalgamation of
relevant data from multiple systems. You can therefore reduce the amount of learning
that needs to take place across the various systems to ensure you have visibility into
what is going on.

3. Middleware Technologies: APIs, ESBs, Messaging

What is middleware?
Middleware is software that enables one or more kinds of communication or connectivity
between applications or application components in a distributed network. By making it
easier to connect applications that weren't designed to connect with one another, and
providing functionality to connect them in intelligent ways, middleware streamlines
application development and speeds time to market.
There are many types of middleware. Some, such as message brokers or transaction
processing monitors, focus on one type of communication. Others, such as web
application servers or mobile device middleware, provide the full range of
communication and connectivity capabilities needed to build a particular type of
application. Still others, such as a cloud-based integration platform as a service
(iPaaS) offering or an enterprise service bus (EBS), function as a centralized integration
hub for connecting all the components in an enterprise. (There's even middleware that
lets developers build their own customized middleware.)
Middleware got its name because the first middleware typically acted as a mediator
between an application front-end, or client, and a back-end resource - e.g., a database,
mainframe application or specialized hardware device - from which the client might
request data.
But today's middleware operates well beyond this scope. Portal middleware, for
example, encompasses the application front-end as well as tools for back-end
connectivity; database middleware typically includes its own data store. And as you'll
read below, an emerging class of middleware leverages container technology to help
developers connect to resources distributed across multiple clouds.

How middleware works


At the most basic level, middleware enables developers to build applications without
having to create a custom integration every time they need to connect to application
components (services or microservices), data sources, computing resources or devices.
It does this by providing services that enable different applications and services to
communicate using common messaging frameworks such as JSON (JavaScript object
notation), REST (representational state transfer), XML (extensible markup language),
SOAP (simple object access protocol), or web services. Typically, middleware also
provides services that enable components written in multiple languages - such as Java,
C++, PHP, and Python - to talk with each other.
In addition to providing this work-saving interoperability, middleware also includes
services that help developers
 Configure and control connections and integrations. Based on information in a client
or front-end application request, middleware can customize the response from the back-
end application or service. In a retailer's ecommerce application, middleware application
logic can sort product search results from a back-end inventory database by nearest
store location, based on the IP address or location information in the HTTP
request header.

 Secure connections and data transfer. Middleware typically establishes a secure


connection from the front-end application to back-end data sources using Transport
Layer Security (TLS) or another network security protocol. And it can provide
authentication capabilities, challenging front-end application requests for credentials
(username and password) or digital certificates.

 Manage traffic dynamically across distributed systems. When application traffic


spikes, enterprise middleware can scale to distribute client requests across multiple
servers, on premises or in the cloud. And concurrent processing capabilities can prevent
problems when multiple clients try to access the same back-end data source
simultaneously.
Types of middleware
There are many different types of middleware. Some focus on specific types of
connectivity, others on specific applications, application components and devices; some
combine middleware capabilities for a specific development task. Some of the best-
known and most commonly-used types of middleware software include:
Message-oriented middleware (MOM) enables application components using different
messaging protocols to communicate to exchange messages. In addition to translating -
or transforming - messages between applications, MOM manages routing of the
messages so they always get to the proper components in the in the proper order.
Examples of MOM include message queues and message brokers.

Remote procedure call (RPC) middleware enables one application to trigger a


procedure in another application - running on the same computer or on a different
computer or network - as if both were part of the same application on the same
computer.

Data or database middleware simplifies access to, and interaction with, back-end
databases. Typically database middleware is some form of SQL database server.
API (application programming interface) middleware provides tools developers can
use to create, expose and manage APIs for their applications - so that other developers
can connect to them. Some API middleware includes tools for monetizing APIs -
enabling other organizations to use them, at cost. Examples of API middleware include
API management platforms, API gateways and API developer portals.

Object request broker (ORB) middleware acts as broker between a request from one
application object or component, and the fulfillment of that request by another object or
component on the distributed network. ORBs operate with the Common Object Request
Broker Architecture (CORBA), which enables one software component to make a
request of another without knowing where other is hosted, or what its UI looks like - the
"brokering" handles this information during the exchange.
Transactional middleware provides services to support the execution of data
transactions across a distributed network. The best-known transactional
middleware are transaction processing monitors (TPMs), which ensure that transactions
proceed from one step to the next - executing the data exchange,
adding/changing/deleting data where needed, etc. - through to completion.

Asynchronous data streaming middleware replicates a data stream in an


intermediate store, enabling data sharing between multiple applications. Apache Kafka is
one of the best-known examples of middleware for real-time data streaming.
Device middleware provides a focused set of integration and connectivity capabilities
for developing apps for a specific mobile OS.

Portal middleware provides tools and resources for integrating content and capabilities
from various related applications 'at the glass' - or on a single screen - to create a single,
composite application.
Robotics middleware simplifies the process of integrating robotic hardware, firmware
and software from multiple manufacturers and locations.
Enterprise application integration middleware

Enterprise application integration middleware lets an organization establish an enterprise


integration hub - a standardized way to connect all applications, application components,
business processes and back-end data sources in the extended enterprise.
Until about ten years ago, the most prevalent enteprise application integration
middleware was the enterprise service bus (ESB), which served as the integration hub
within a service-oriented architecture (SOA). Today iPaaS lets an organization connect
applications, data, processes and services across on-premises, private cloud, and public
cloud environments - without the work and expense of purchasing, installing, managing,
and maintaining the integration middleware (and the hardware it runs on) within its own
data center.
Platform middleware

Platform middleware (or application platform middleware) can further support application
development and accelerate application delivery by providing a runtime hosting
environment - such as a Java runtime environment (Java RE), or containers, or both - for
application or business logic. Platform middleware can include or combine enterprise
application servers, web servers, and content management systems, as well as the
other middleware types listed above.
Middleware and cloud-native applications
Cloud-native is an application development approach that leverages fundamental cloud
computing technologies, with the goal of providing a consistent development,
deployment and management across on-premises, private cloud or public cloud
environments.
Practically speaking, today cloud native-applications are applications built
from microservices, and deployed in containers orchestrated using Kubernetes.
Microservices are loosely-coupled application components that encompass their own
stack, and can be deployed and updated independently of one another, and
communicate with one another using a combination of REST APIs, message brokers
and event streams. Containers are lightweight executables that package application
code – such as microservices – together with just the OS libraries and dependencies
needed to run that code on any traditional IT or cloud infrastructure.
Together these and related technologies create a powerful, develop-once-deploy-
anywhere platform for delivering net-new hybrid cloud applications, and
for modernizing traditional legacy systems for use in the cloud. But they also lead to a
complex development environment that combines even more software applications, data
sources, programming languages, tools and distributed systems.
Middleware can resolve some of this complexity, but running containerized applications
with conventional middleware can add complexities of its own, as well as the kind of
infrastructure overhead that containers were designed to eliminate. For this reason,
popular cloud application development platforms such as Cloud Foundry (link resides
outside IBM) and Red Hat Open Shift evolved to include containerized middleware -
middleware modularized so that just the required connectivity functions can be packaged
in a container.

4. Service-Oriented Architecture (SOA)

What is service-oriented architecture?


Service-oriented architecture (SOA) is a method of software development that uses
software components called services to create business applications. Each service
provides a business capability, and services can also communicate with each other
across platforms and languages. Developers use SOA to reuse services in different
systems or combine several independent services to perform complex tasks.
For example, multiple business processes in an organization require the user
authentication functionality. Instead of rewriting the authentication code for all business
processes, you can create a single authentication service and reuse it for all
applications. Similarly, almost all systems across a healthcare organization, such as
patient management systems and electronic health record (EHR) systems, need to
register patients. These systems can call a single, common service to perform the
patient registration task.
What are the benefits of service-oriented architecture?
Service-oriented architecture (SOA) has several benefits over the traditional monolithic
architectures in which all processes run as a single unit. Some major benefits of SOA
include the following:
Faster time to market
Developers reuse services across different business processes to save time and costs.
They can assemble applications much faster with SOA than by writing code and
performing integrations from scratch. prioritize and streamline development
processes.
Efficient maintenance
It’s easier to create, update, and debug small services than large code blocks in
monolithic applications. Modifying any service in SOA does not impact the overall
functionality of the business process.

Greater adaptability
SOA is more adaptable to advances in technology. You can modernize your applications
efficiently and cost effectively. For example, healthcare organizations can use the
functionality of older electronic health record systems in newer cloud-based applications.
What are the basic principles of service-oriented architecture?
There are no well-defined standard guidelines for implementing service-oriented
architecture (SOA). However, some basic principles are common across all SOA
implementations.
Interoperability
Each service in SOA includes description documents that specify the functionality of the
service and the related terms and conditions. Any client system can run a service,
regardless of the underlying platform or programming language. For instance, business
processes can use services written in both C# and Python. Since there are no direct
interactions, changes in one service do not affect other components using the service.
Loose coupling
Services in SOA should be loosely coupled, having as little dependency as possible on
external resources such as data models or information systems. They should also be
stateless without retaining any information from past sessions or transactions. This way,
if you modify a service, it won’t significantly impact the client applications and other
services using the service.
Abstraction
Clients or service users in SOA need not know the service's code logic or
implementation details. To them, services should appear like a black box. Clients get the
required information about what the service does and how to use it through service
contracts and other service description documents.
Granularity
Services in SOA should have an appropriate size and scope, ideally packing one
discrete
business function per service. Developers can then use multiple services to create a
composite service for performing complex operations.
What are the components in service-oriented architecture?
There are five main components in service-oriented architecture (SOA).

Service
Services are the basic building blocks of SOA. They can be private—available only to
internal users of an organization—or public—accessible over the internet to all.
Individually, each service has three main features.
Service implementation
The service implementation is the code that builds the logic for performing the specific
service function, such as user authentication or bill calculation.
Service contract
The service contract defines the nature of the service and its associated terms and
conditions, such as the prerequisites for using the service, service cost, and quality of
service provided.

Service interface
In SOA, other services or systems communicate with a service through its service
interface. The interface defines how you can invoke the service to perform activities or
exchange data. It reduces dependencies between services and the service requester.
For example, even users with little or no understanding of the underlying code logic can
use a service through its interface.
Service provider
The service provider creates, maintains, and provides one or more services that others
can use. Organizations can create their own services or purchase them from third-party
service vendors.
Service consumer
The service consumer requests the service provider to run a specific service. It can be
an entire system, application, or other service. The service contract specifies the rules
that the service provider and consumer must follow when interacting with each other.
Service providers and consumers can belong to different departments, organizations,
and even industries.
Service registry
A service registry, or service repository, is a network-accessible directory of available
services. It stores service description documents from service providers. The description
documents contain information about the service and how to communicate with it.
Service consumers can easily discover the services they need by using the service
registry.
How does service-oriented architecture work?
In service-oriented architecture (SOA), services function independently and provide
functionality or data exchanges to their consumers. The consumer requests information
and sends input data to the service. The service processes the data, performs the task,
and sends back a response. For example, if an application uses an authorization
service, it gives the service the username and password. The service verifies the
username and password and returns an appropriate response.
Communication protocols
Services communicate using established rules that determine data transmission over a
network. These rules are called communication protocols. Some standard protocols to
implement SOA include the following:
• Simple Object Access Protocol (SOAP)
• RESTful HTTP
• Apache Thrift
• Apache ActiveMQ
• Java Message Service (JMS)
You can even use more than one protocol in your SOA implementation.
What is an ESB in service-oriented architecture?
An enterprise service bus (ESB) is software that you can use when communicating
with a system that has multiple services. It establishes communication between services
and service consumers no matter what the technology.
Benefits of ESB
An ESB provides communication and transformation capabilities through a reusable
service interface. You can think of an ESB as a centralized service that routes service
requests to the appropriate service. It also transforms the request into a format that is
acceptable for the service’s underlying platform and programing language.
What are the limitations in implementing service-oriented architecture?
Limited scalability
System scalability is significantly impacted when services share many resources and
need to coordinate to perform their functionality.
Increasing interdependencies
Service-oriented architecture (SOA) systems can become more complex over time and
develop several interdependencies between services. They can be hard to modify or
debug if several services are calling each other in a loop. Shared resources, such as
centralized databases, can also slow down the system.
Single point of failure
For SOA implementations with an ESB, the ESB creates a single point of failure. It is a
centralized service, which goes against the idea of decentralization that SOA advocates.
Clients and services cannot communicate with each other at all if the ESB goes down.

What are microservices?


Microservices architecture is made up of very small and completely independent
software components, called microservices, that specialize and focus on one task only.
Microservices communicate through APIs, which are rules that developers create to let
other software systems communicate with their microservice.
The microservices architectural style is best suited to modern cloud computing
environments. They often operate in containers—independent software units that
package code with all its dependencies.
Benefits of microservices
Microservices are independently scalable, fast, portable, and platform agnostic—
characteristics native to the cloud. They are also decoupled, which means they have
limited to no dependencies on other microservices. To achieve this, microservices have
local access to all the data they need instead of remote access to centralized data that
other systems also access and use. This creates data duplication which microservices
make up for in performance and agility.
SOA compared to microservices
Microservices architecture is an evolution of the SOA architectural style. Microservices
address the shortcomings of SOA to make the software more compatible with modern
cloud-based enterprise environments. They are fine grained and favor data duplication
as opposed to data sharing. This makes them completely independent with their own
communication protocols that are exposed through lightweight APIs. It’s essentially the
consumers' job to use the microservice through its API, thus removing the need for a
centralized ESB.
How does AWS help you implement microservices?
AWS is a great place to build modern applications with modular architectural patterns,
serverless operational models, and agile development processes. It offers the most
complete platform for building highly available microservices to power modern
applications of any scope and scale. For example, you can do the following:
• Build, isolate, and run secure microservices in managed containers to simplify
operations and reduce management overhead.
• Use AWS Lambda to run your microservices without provisioning and managing
servers.
• Choose from 15 relational and non-relational purpose-built AWS databases to support
microservices architecture.
• Easily monitor and control microservices running on AWS with AWS App Mesh.
• Monitor and troubleshoot complex microservice interactions with AWS X-Ray.

Microservices on AWS help you innovate faster, reduce risk, accelerate time to market,
and decrease your total cost of ownership. Get started with SOA and microservices on
AWS by creating an AWS account today.

5. Microservices Architecture and Integration

A microservices architecture fosters the building of software applications as a suite of


independent, fine-grained, and autonomous services. Therefore, when we build a real-
world business use case, the microservices that comprise the application have to
communicate with each other. With the proliferation of fine-grained services, integrating
microservices and building inter-service communication has become one of the most
challenging tasks in the realization of microservices architectures.
To understand the challenges of a microservices architecture, let’s first look at the very
near past. In the pre-microservices era of service-oriented architecture (SOA) and web
services, we would use a central enterprise service bus (ESB) architecture, where all of
the service composition and integrations were implemented.
For example, as shown in Figure 1, all of the services were integrated with an ESB, and
selected business functions were exposed to the consumers via an API management
layer. The ESB provided all of the capabilities required to integrate disparate APIs, data,
and systems.
WSO2
Figure 1: A centralized integration architecture using an enterprise service bus.
However, when we move into a microservices architecture, having a monolithic
integration layer with a large pile of business logic makes it really hard to achieve the
fundamental concepts of microservices, such as being autonomous and oriented toward
a narrow set of business capabilities. Therefore, it is not practical to use a central ESB
as the integration bus.
Instead, with a microservices architecture, microservices are integrated using the smart
endpoints and dumb pipe style, where all of the intelligence lives at the endpoints, which
are interconnected via a dumb messaging infrastructure. As shown in Figure 2, we can
design microservices for the business capabilities that we have identified, and they have
to communicate with each other to realize various business use cases.

WSO2
Figure 2: A microservices architecture is characterized by smart endpoints and dumb
pipes.
Although it sounds quite simple to break up a monolithic app and central ESB and
offload those functions to each service, there are quite a few challenges that we face.
So, it’s quite important to understand the common microservices integration patterns and
select the suitable technologies and frameworks for implementing them. Let’s look at two
of these key microservice integration patterns, active composition and reactive
composition.
Active composition pattern for microservices architecture
In an active composition pattern, microservices communicate using request-response
style communication, where a given microservice can invoke one or more other
microservices using synchronous messaging protocols such as REST/HTTP, gRPC
remote procedure call, and the GraphQL API query language, among others. Figure 3
illustrates such a microservices integration scenario where services are organized based
on different interaction types.
In an active composition microservice implementation, we can identify the core or
atomic services. These are fine-grained, self-contained services, with no or minimal
external service dependencies, that consist mostly of business logic and little or no
network communication logic.

WSO2
Figure 3: In the active composition pattern, microservices communicate using
synchronous messaging protocols.
Atomic microservices often cannot be directly mapped to a business function as they are
too fine-grained. Hence a specific business function may require a composition of
multiple atomic or core services. The middle layer comprises such composite or
integration microservices. These services often have to support a significant portion of
the ESB functions such as routing, transformations, orchestration, resiliency, and
stability.
By contrast, composite or integration services are coarse-grained relative to atomic
services. They are independent from each other and contain business logic (e.g. routing,
what services to call, and how to do data type mapping) and network communication
logic (e.g. inter-service communication through various protocols and resiliency
behaviors such as circuit breakers).
You will expose a selected set of your composite services—or even some atomic
services—as managed APIs using API services and/or edge services. These are special
types of composite services that apply basic routing capabilities, versioning of APIs, API
security patterns, throttling, and monetization, as well as create API compositions. As
shown in Figure 3, these APIs are centrally managed by a control plane or API
management layer.
The benefit of active composition is that you can select diverse implementation
technologies to implement core, composite, and API services. However, a drawback of
this pattern is the inter-service coupling and dependencies. A given service may depend
on one or more downstream services. So, active composition is suitable only for certain
microservices interactions where interactive request-response style is a key
requirement.

Reactive composition pattern for microservices architecture


With reactive composition, services can communicate and create compositions using
asynchronous, event-driven messaging. We can use one of two techniques to implement
reactive composition: publish-subscribe messaging for multiple consumers or queue-
based messaging for a single consumer.
As illustrated in Figure 4, in the reactive composition pattern microservices do not talk
directly to each other; rather they send and receive messages to and from a centralized
event or message bus. The business logic of a given service is implemented in such a
way that, upon the arrival of an event, the service business logic is applied and
executed. Service business logic also may submit new events to the event bus. This bus
acts as the dumb messaging infrastructure, and all of the business logic is implemented
at the producer or consumer level.
WSO2
Figure 4: In the reactive composition pattern, microservices communicate indirectly
through an event bus.
Messaging techniques commonly used in the reactive composition are the Apache Kafka
messaging protocol, the NATS cloud-native messaging protocol, and the AMQP protocol
found in messaging software such as RabbitMQ, ActiveMQ, and Artemis.
Reactive composition eliminates the tight coupling between the services and makes
services more autonomous. The reactive composition is often used in other microservice
patterns such as event sourcing and Command and Query Responsibility
Segregation (CQRS), among others.

a query is simply a request for information. Similarly, the meaning of a query in database
management is a request for data. If you need to access, manipulate, delete, or retrieve
data from your relational database, you'll need a database query written using a specific
syntax.
Now let’s take a look at how these patterns are applied in real-world microservices
implementations.
Active and reactive patterns in real-world microservices implementations
In most real-world microservices implementations, we have to use a hybrid of active and
reactive composition. As shown in Figure 5, we can use synchronous messaging-based
composition for most of the external facing services. Such functions are often exposed to
the consumers via an API gateway layer, which is comprised of multiple API services,
and these functions are managed centrally by the API control plane.
Usually, the communications between API services and consumers leverage RESTful
messaging or the GraphQL API query language. API services and management layers
can be implemented using open source technologies such as WSO2 API Manager,
Kong, and Gluu or commercial alternatives such as Apigee, IBM, Software AG, and
Tibco.
WSO2
Figure 5: Real-world microservices implementations typically make use of both active
and reactive composition.
Both active and reactive composition styles require a rich integration framework or
programming language that can cater to various integration and composition needs. For
example, these may include message transformations, connectors to different systems
and services, synchronous or asynchronous event-driven messaging, or support for a
variety of different message exchange patterns. There are a few open source integration
frameworks that are microservices friendly including Camel-K, WSO2 Micro Integrator,
and Micronaut.io. These capabilities also are supported in certain programming
languages (e.g. Ballerina.io) and language frameworks (e.g. Spring Boot).
Rather than using conventional RESTful messaging, the internal synchronous
communication between services can leverage gRPC as the communication technology,
owing to its performance, type-safety-ness and contract-driven nature.
The asynchronous event-driven services interaction can utilize event bus technology,
such as software leveraging the Apache Kafka, NATS, or AMQP protocols. The
asynchronous services should have integration logic that can consume or produce
messages and exchange them with the event bus. Most of the integration technologies
that we have discussed in this article can be used, but these same messaging standards
are often supported within the libraries of most standard programming languages as
well. When using messaging infrastructure, it’s important not to isolate any business
logic from the event bus.
In this article, we discussed the two main approaches to a pure microservices
implementation. However, in most real-world scenarios, microservices-based
applications have to be integrated with monolithic systems in the enterprise. Thus you
might want to build a bridging layer, often known as the anti-corruption layer, between
the microservices and monolithic subsystems. In most practical use cases, the
microservices and monolithic subsystems co-exist side by side, with the anti-corruption
layer allowing the two to be seamlessly integrated without changing either of those
systems. Existing integration technologies, such as an ESB or integration bus, can be
used to implement the anti-corruption layer.
What is Camel K?
Apache Camel K is an open-source integration framework that allows developers to
build and deploy cloud-native applications and microservices quickly and easily. It
leverages the power of Apache Camel, which is a popular open-source integration
framework, and runs on top of Kubernetes, a popular container orchestration platform.
The “K” in Camel K stands for Kubernetes, which means that Camel K is designed to
work seamlessly with Kubernetes. It provides a Kubernetes-native way of building and
deploying microservices using Camel-based integrations.
One of the key features of Apache Camel K is its simplicity. It provides a lightweight and
easy-to-use development model that allows developers to focus on building business
logic instead of worrying about the underlying infrastructure.
Apache Camel K also provides a wide range of connectors and components that can be
used to integrate with various systems and services, such as databases, messaging
systems, and cloud services.
Overall, Apache Camel K is a powerful and flexible integration framework that is well-
suited for building and deploying cloud-native applications and microservices on
Kubernetes.
Use cases of APACHE CAMEL K
Apache Camel K is an integration platform built on top of Apache Camel that enables
developers to easily build and deploy cloud-native integrations using Kubernetes. Here
are some common use cases for Apache Camel K:
1. Cloud-native integrations: Apache Camel K allows you to build cloud-native integrations
that can be easily deployed and managed on Kubernetes. This makes it an ideal
platform for integrating microservices and other cloud-based systems.
2. Event-driven architecture: Apache Camel K provides a powerful event-driven
architecture that can be used to build real-time integrations that respond to events as
they occur. This makes it an ideal platform for building event-driven applications that
need to respond to changes in real-time.
3. IoT integrations: Apache Camel K can be used to build integrations for IoT devices and
sensors. This makes it an ideal platform for building IoT applications that need to collect
data from a large number of sensors and devices.
4. API management: Apache Camel K can be used to build APIs that can be easily
integrated with other systems. This makes it an ideal platform for building API gateways
and managing API traffic.
5. Data integration: Apache Camel K can be used to build data integrations that can move
data between different systems. This makes it an ideal platform for building data
pipelines that need to move data between different databases, systems, or applications.
6. ETL: Apache Camel K can be used to build ETL (Extract, Transform, Load) processes
that can be used to move data from one system to another. This makes it an ideal
platform for building data processing pipelines that need to transform data in real-time.
Overall, Apache Camel K is a powerful integration platform that can be used to build a
wide range of cloud-native integrations that can be easily deployed and managed on
Kubernetes.

WSO2 Enterprise Integrator 7.1.0 is a powerful configuration-driven approach to


integration, which allows developers to build integration solutions graphically.
This is a hybrid platform that enables API-centric integration and supports various
integration architecture styles: microservices architecture, cloud-native architecture, or a
centralized ESB architecture. This integration platform offers a graphical/configuration-
driven approach to developing integrations for any of the architectural styles.
Centralized ESB¶
The heart of WSO2 EI 7.1 is the Micro Integrator server, which is an event-driven,
standards-based messaging engine (the Bus). This ESB supports message routing,
message transformations, and other types of messaging use cases. If your organization
uses an API-driven, centralized, integration architecture, the Micro Integrator can be
used as the central integration layer that implements the message mediation logic
connecting all the systems, data, events, APIs, etc. in your integration ecosystem.
Microservices¶
The Micro Integrator of WSO2 EI is also lightweight and container friendly. This allows
you to leverage the comprehensive enterprise messaging capabilities of the Micro
Integrator in your decentralized, cloud-native integrations.
As shown above, if your organization is running on a decentralized, cloud-native,
integration architecture where microservices are used for integrating the various APIs,
events, and systems, WSO2 Micro Integrator can easily function as
your Integration (micro) services and API (micro) services.
Low code integration¶
The WSO2 Micro Integrator is coupled with WSO2 Integration Studio; a comprehensive
graphical integration flow designer for building integrations using a simple drag-and-drop
functionality.

Administration¶
The Micro Integrator Dashboard and Command Line Interface (CLI) are specifically
designed for monitoring and administration of the Micro Integrator instances. Each of
these tools are capable of binding to a single server instance by invoking
the management API that is exposed by the server. This allows you to view and manage
artifacts, logs/log configurations, and users of a server instance.

A command-line interface (CLI) is a text-based user interface (UI) used to run


programs, manage computer files and interact with the computer. Command-line
interfaces are also called command-line user interfaces, console user interfaces and
character user interfaces.
Streaming integration¶
The Streaming Integrator of WSO2 EI 7.1 is a cloud-native, lightweight component that
understands, captures, analyzes, processes, and acts upon streaming data and events
in real-time. It utilizes the SQL-like query language ‘Siddhi’ to implement the solution.
Correlation ids are unique identifiers that help you trace requests across multiple
services in a distributed system. They are essential for debugging, logging, and
monitoring the performance and behavior of your microservices.
An aggregate is composed of at least one entity: the aggregate root, also called root
entity or primary entity. Additionally, it can have multiple child entities and value objects,
with all entities and objects working together to implement required behavior and
transactions.

The Streaming Integrator allows you to integrate static data sources with streaming data
sources. Thus, it enables various types of applications (e.g., files, cloud-based
applications, data stores, streaming applications) to access streaming data and also
exposes their output in a streaming manner. This is useful for performing ETL (Extract,
Transform, Load) operations, capturing change data (i.e., CDC operations), and stream
processing.
Micronaut framework
Micronaut is an open source JVM-based software framework for building lightweight,
modular applications and microservices. Micronaut is known for its ability to help
developers create apps and microservices with small memory footprints and short
startup times. An important advantage of the Micronaut framework is that startup time
and memory consumption are not tied to the size of an app's codebase. This makes the
development of integration tests much easier and their execution much faster.
A big difference between Micronaut and other frameworks is that Micronaut analyzes
metadata as soon as the application is compiled. During this compilation phase
Micronaut will generate an additional set of classes that represent the state of the
application already preconfigured. This enables dependency injection (DI) and aspect-
oriented programming (AOP) behavior to be applied much more efficiently when the
application finally runs.
Micronaut, which was introduced in 2018 by the creators of the Grails framework,
provides native support for service discovery, distributed configuration, client-side load
balancing and authentication. The framework is licensed under the Apache License v2
and the Micronaut Foundation oversees best practices and documentation.
What should developers know about Micronaut?
The Micronaut framework was designed for building and testing low-memory
microservices, serverless applications and message-driven microservices.

It does this by avoiding the common disadvantages of most traditional Java frameworks,
including runtime reflection for dependency injection, dynamic classloading, runtime byte
code generation and proxy generation. The framework was designed specifically with
cloud architecture in mind. Apache Maven and Gradle can be used as build tools.
Components of the modular framework predefine how programmers address the
following:
 dependency injection
 inversion of control (IoC)
 aspect-oriented programming (AOP)
 configuration and configuration sharing
 service discovery
 HTTP routing
 client-side load-balancing
 proxies
How does Micronaut work?
Micronaut is designed to function as both a client and server framework. The framework
features an annotation-based programming model that is very similar to the Java Spring
framework. Unlike the Spring framework, however, Micronaut does not use Java
Reflection APIs. Instead, it integrates directly with the Java compiler through annotation
processors. This allows Micronaut to compute an additional set of classes that sit
alongside user-defined classes. The classes serve to perform dependency injection, as
well as define compilation time and aspect-oriented proxies in a completely reflection-
free manner.
Because of the way Micronaut integrates directly with language compilers, however, the
only JVM languages Micronaut can currently support are Java, Kotlin or Groovy. Plans
exist to introduce support for other languages in the future.
Why is Micronaut so well-suited for microservices?
Micronaut has many features that are tailor-made for microservices, including the
following:
Reactive Streams. Micronaut supports any framework that implements the Reactive
Streams standard. The framework also integrates with reactive database drivers for SQL
and NoSQL databases.
Message-driven microservices. Micronaut supports many different messaging
systems, including Kafka, RabbitMQ, MQTT, JMS and NATS.io. Pulsar support is
planned for a future release.
Serverless functions. Micronaut supports the development, testing and deployment of
serverless functions for many different cloud providers, including AWS Lambda, Oracle
Functions, Google Cloud Functions and Azure Functions.
OpenAPI documentation. Micronaut creates a YAML file at compilation time that can
be added as a static resource later on or served through the browser using tools such as
Swagger UI.
GraalVM-ready. Micronaut applications can be compiled into GraalVM native images to
reduce startup times even more. GraalVM uses a closed-world static analysis approach
to produce native images that require configuration for any dynamic parts of the
application. Micronaut's relative lack of reflection and dynamic classloading mean that
less configuration is required to get a GraalVM native image operational.
What are some alternatives to Micronaut?
Alternatives to the Micronaut framework include:
 Spring Boot
 Node.js
 Django
 ASP.NET
 Laravel
 Android SDK
 Rails
Some frameworks in the list above, including ASP.NET and Spring, have begun to
integrate some of the ideas pioneered by Micronaut.

6. Data Integration and Transformation


DATA INTEGRATION:

Data integration is one of the steps of data pre-processing that involves combining data
residing in different sources and providing users with a unified view of these data.
• It merges the data from multiple data stores (data sources)
• It includes multiple databases, data cubes or flat files.
• Metadata, Correlation analysis, data conflict detection, and resolution of semantic
heterogeneity contribute towards smooth data integration.
• There are mainly 2 major approaches for data integration - commonly known as "tight
coupling approach" and "loose coupling approach".
Tight Coupling
o Here data is pulled over from different sources into a single physical location through
the process of ETL - Extraction, Transformation and Loading.
o the single physical location provides a uniform interface for querying the data.
ETL layer helps to map the data from the sources so as to provide a uniform data
o warehouse. This approach is called tight coupling since in this approach the data is
tightly coupled with the physical repository at the time of query.
ADVANTAGES:
1. Independence (Lesser dependency to source systems since data is physically copied
over)
2. Faster query processing
3. Complex query processing
4. Advanced data summarization and storage possible
5. High Volume data processing
DISADVANTAGES: 1. Latency (since data needs to be loaded using ETL)
1. Costlier (data localization, infrastructure, security)
Loose Coupling
o Here a virtual mediated schema provides an interface that takes the query from the
user, transforms it in a way the source database can understand and then sends the
query directly to the source databases to obtain the result.
o In this approach, the data only remains in the actual source databases.
o However, mediated schema contains several "adapters" or "wrappers" that can
connect back to the source systems in order to bring the data to the front end.
ADVANTAGES:
Data Freshness (low latency - almost real time)
Higher Agility (when a new source system comes or existing source system changes -
only the corresponding adapter is created or changed - largely not affecting the other
parts of the system)
Less costly (Lot of infrastructure cost can be saved since data localization not required)
DISADVANTAGES:
1. Semantic conflicts
2. Slower query response
3. High order dependency to the data sources
For example, let's imagine that an electronics company is preparing to roll out a new
mobile device. The marketing department might want to retrieve customer information
from a sales department database and compare it to information from the product
department to create a targeted sales list. A good data integration system would let the
marketing department view information from both sources in a unified way, leaving out
any information that didn't apply to the search.
II. DATA TRANSFORMATION:
In data mining pre-processes and especially in metadata and data warehouse, we use
data transformation in order to convert data from a source data format into destination
data.
We can divide data transformation into 2 steps:
• Data Mapping:
It maps the data elements from the source to the destination and captures any
transformation that must occur.
• Code Generation:
It creates the actual transformation program.
Data transformation:
• Here the data are transformed or consolidated into forms appropriate for mining.
• Data transformation can involve the following:

 Smoothing:
• It works to remove noise from the data.
• It is a form of data cleaning where users specify transformations to correct data
inconsistencies.
• Such techniques include binning, regression, and clustering.

 Aggregation:
• Here summary or aggregation operations are applied to the data.
• This step is typically used in constructing a data cube for analysis of the data at
multiple granularities.
• Aggregation is a form of data reduction.

 Generalization :
• Here low-level or “primitive” (raw) data are replaced by higher-level concepts through
the use of concept hierarchies.
• For example, attributes, like age, may be mapped to higher-level concepts, like youth,
middle-aged, and senior.
• Generalization is a form of data reduction.

 Normalization:
• Here the attribute data are scaled so as to fall within a small specified range, such as
1:0 to 1:0, or 0:0 to 1:0.
• Normalization is particularly useful for classification algorithms involving neural
networks, or distance measurements such as nearest-neighbor classification and
clustering
• For distance-based methods, normalization helps prevent attributes with initially large
ranges (e.g., income).
• There are three methods for data normalization:
1. min-max normalization:
o performs a linear transformation on the original data
o Suppose that minA and maxA are the minimum and maximum values of an attribute, A.
o Min-max normalization maps a value, v, of A to v0 in the range [new minA;newmaxA]

by computing
o Min-max normalization preserves the relationships among the original data values.
1. z-score normalization
o Here the values for an attribute, A, are normalized based on the mean and standard
deviation of A.

o Value, v of A is normalized to v0 by computing , where A and σA are the mean


and standard deviation, respectively.
o This method of normalization is useful when the actual minimum and maximum of
attribute Aare unknown, or when there are outliers that dominate the min-max
normalization.
1. normalization by decimal scaling:
o Here the normalization is done by moving the decimal point of values of attribute A.
o The number of decimal points moved depends on the maximum absolute value of A.

o Value, v of A is normalized to v0 by computing where j is the


smallest integer such that Max
• Attribute construction:
o Here new attributes are constructed and added from the given set of attributes to help
the mining process.
o Attribute construction helps to improve the accuracy and understanding of structure in
high-dimensional data.
o By combining attributes, attribute construction can discover missing information about
the relationships between data attributes that can be useful for knowledge discovery.
EG:The structure of stored data may vary between applications, requiring semantic
mapping prior to the transformation process. For instance, two applications might store
the same customer credit card information using slightly different structures:
To ensure that critical data isn’t lost when the two applications are integrated, information
from Application A needs to be reorganized to fit the data structure of Application B.
III. DATA CLEANING

• Real-world data tend to be incomplete, noisy, and inconsistent.


• Data cleaning routines attempt to fill in missing values, smooth out noise while
identifying outliers, and correct inconsistencies in the data.
• Data cleaning tasks include:

 Fill in missing values


1. Ignore the tuple: This is usually done when the class label is missing
2. Fill in the missing value manually: this approach is time-consuming and may not be
feasible given a large data set with many missing values.
3. Use a global constant to fill in the missing value: Replace all missing attribute values by
the same constant
4. Use the attribute mean to fill in the missing value: Use a particular value to replace the
missing value for an attribute.
5. Use the attribute mean for all samples belonging to the same class as the given tuple:
replace the missing value with the average value of the attribute for the given tuple.
6. Use the most probable value to fill in the missing value: This may be determined with
regression, inference-based tools using a Bayesian formalism, or decision tree induction.

 Identify outliers and smooth out noisy data


• Noise is a random error or variance in a measured variable. And can be smoothened
using the following steps:
1. Binning: Binning methods smooth a sorted data value by consulting its “neighborhood,”
that is, the values around it.
2. Regression: Data can be smoothed by fitting the data to a function, such as with
regression. Linear regression involves finding the “best” line to fit two attributes (or
variables), so that one attribute can be used to predict the other. Multiple linear
regressionis an extension of linear regression, where more than two attributes are
involved and the data are fit to a multidimensional surface.
3. Clustering: Outliers may be detected by clustering, where similar values are organized
into groups, or “clusters.”

7. Security and Authentication in Integrated Systems


A very common scenario nowadays is to have a lot of passwords to access banks, social
networks, online stores, Netflix, among thousands of other services. The same applies in
the business context, since in our companies we also have to control access credentials
for various systems such as email, ERP systems, ECM, Human Resources, etc. The
problem is both for users and for IT administrators who have to manage all these
password repositories.
Currently, for the personal context we have several alternatives for password control,
such as the Google Chrome password manager, which, when synchronized with our
Google user, allows us to store several passwords. In companies, the dream of users
and system administrators is to have a single password authenticated in one place (the
so-called Single Sign-on) that allows instant access to all the systems in the
organization.
To assist in this process there are several integrated authentication mechanisms,
which, as the name implies, integrate with the company various systems and guarantee
the user’s identity without us having to enter it several times.
How do integrated authentication mechanisms work?
Basically these systems work as follows:
1. The integrated authentication system is configured against a repository of users and
passwords, which can be, for example, a password repository of your own or integrated
with an Active Directory (repository of users and computers of a Windows domain).
2. When a user accesses an application (in our example, the company ERP), the
application checks whether the user is already logged in. If not, it asks the integrated
authentication system “Which user is logged in?
3. The integrated authentication system checks whether there is an authenticated user. If
not, it requests the user’s credentials (user and password).
4. Once the user is logged into the integrated authentication system, it tells the application
which user is logged in.
5. The ERP application in our example receives the user and proceeds to give access to
the tool.
6. In case the user accesses another company application, such as the HR system, this
new application repeats step 2. However, since the user is already logged into the
system, it goes straight to step 4.
We can see that the operation is relatively simple: the user accesses the system, the
system accesses the integrated authentication, and the integrated authentication passes
the logged-on user to the system (and if necessary, collects the user’s credentials).
Advantages of integrated authentication
Integrated authentication mechanisms also allow for a number of advantages, such as:
 User-friendly login interface, because the “login screen” will always be the same;
 Centralized password repository;
 Possibility to implement token authentication, multi-factor authentication (with SMS
messaging for example) for all applications; and
 Implementation of policies (schedules, allowed stations, among others) that affect the
routine access to the systems.
Integrated authentication systems and protocols
Here we will mention three:
 CAS
 SAML
 ADFS

CAS – Central Authentication Service


This is a single sign-on authentication protocol for the web, allowing users to access
multiple applications by entering their credentials only once. It basically follows the script
mentioned above: when the client accesses a system that requires authentication, it
redirects to the CAS, which authenticates the user and returns the user to the
application.
CAS was conceived and developed at YALE University and later became a project of the
JASIG (Java in Administration Special Interest Group). Currently it is maintained by the
APEREO Foundation. Among the main resources we can mention:
1. Support for various authentication protocols (LDAP, SAML, and others);
2. Multi-factor authentication (password and SMS for example);
3. Password management and authentication policies.
SAML – Security Assertion Markup Language
Unlike CAS, SAML is an XML language for exchanging security information. It is
mentioned here because it is used by many integrated authentication mechanisms
(including CAS) to exchange information.
Basically it defines a standard XML message for the “conversations” between the
application and the authentication service. For example, it defines the pattern of what
information the IDP (Identity Provider) should return to the application according to what
the application requests.
This language is widespread and can work with many integrated, single sign-on
authentication systems such as CAS and Microsoft’s ADFS.

ADFS – Active Directory Federation Services


It is an integrated authentication system that can run on Windows servers and provides
single sign-on for applications located in the organization. ADFS uses the Windows
Active Directory as the identity provider and password repository and integrates with
systems via various protocols such as LDAP, SAML and others.
8. Error Handling and Resilience
Building resilient systems requires comprehensive error management. Errors could
occur in any part of the system / or its ecosystem and there are different ways of
handling these e.g.
 Data center — data center failure where the whole DC could become unavailable due to
power failure, network connectivity failure, environmental catastrophe etc.. — This is
addressed through monitoring & redundancy. Redundancy in power, network, cooling
systems and possibly everything else relevant . Redundancy by building additional data
centers
 Hardware — servers/storage hardware/software faults such as disk failure, disk full,
other hardware failures, servers running out of allocated resources, server software
behaving abnormally, intra DC network connectivity issues etc. — Again the approach
here is same. Monitor the servers on various parameters and build redundancy. There
are various High availability deployment patterns which are employed and with the
advent of containerization bundled with the power of DevOps, more efficient patterns of
solving this problem have also emerged. Architects & designers of the systems need to
take care of the availability aspects of their components while designing the system as
per the business need & the cost implications. In todays world, cloud providers take care
of the rest usually.
 Faults in Applications — Irrespective of whether the application is being deployed in
cloud or on-premise or irrespective of the technical stack of the application — this is
something which is the responsibility of the individual application teams. Cloud
deployments probably will help re duce the errors instances, some technical stacks
could be more mature than the other ones but the errors will occur and these will need to
be handled. With Microservices based distributed architectures, it becomes even more
interesting.
There are various steps in making applications resilient to faults
 Minimizing the errors by applying alternate architectural/design patterns. For e.g.
Asynchronous handling of user requests may help avoid situations of servers
overloading and even provide consistent experience to users.
 Graceful error handling by the application.
 Raise an incident if needed — The important part here is reliably raising an incident
based on the need and not let the user requests fall through the cracks. This is the
fallback scenario for applications when they are not able to handle the errors. While this
would be used to address the issues offline ( and applications may not choose this as a
route always to solve the error at hand directly) but even more importantly, this is a
crucial step for offline analysis of errors and taking preventive steps against their
recurring.
Brief Note on the Patterns
There are multiple architectural patterns to address the fault resiliency of applications
and a lot depends on functional requirements and NFRs. Resiliency approach in terms
of design also depends on the architectural paradigm of the application — if it is
microservices based, a good amount of focus would be on microservices integration
dependency related errors. In events based architectures, the focus would also be on
reliability in terms of processing idempotency, data loss when things go wrong apart from
normal error handling. In synchronous APIs based applications, while applications can
simply throw the error back to the caller but some kind of monitoring/incident
management could sometimes be useful if the problem lasts longer. In batch based
components, the focus could be on the abilities to restart/resume a batch in an
idempotent manner.
Application Error Handling
On the error handling part in the applications, a careful upfront thought as part of design
process is important. Even if the details are are left out for later but at a high level, an
approach should be defined which again may vary depending on the use cases & design
of the application.
Error Codes
How we define error codes is also an important part of error handling. There are no
general conventions/guidelines on error codes and every application/system goes about
its own way of defining error codes. However error codes if given some thought and if
standardized across the organization can help in a significant way. Its like having a
common definition which everyone in the organization understands. Further, having error
codes which are self explanatory/intuitive can help in increased productivity during
resolution, can help in offline analysis on for e.g. most occurring errors across systems,
errors occurring during peak loads , systems which are most susceptible to a particular
kind of errors etc.. and This can then go a long way in engineering taking some long
term mitigation actions on such errors — This could even be a crucial metric in the
overall DevOps of the organization.
Error Handling
Below is an example on how one can go about handling errors in an application which is
based on events based architecture. Some of the steps mentioned could vary for other
architectural patterns.
Applications need to distinguish retryable errors from the non-retryable ones. If there is
something wrong with the input message itself, usually, there is no point retrying on such
an error unless there is a manual intervention. On the other hand, a problem with DB
connectivity is worth retrying.

When applications are retrying on errors, they could choose a uniform configuration for
retrying across all the retryable errors or they may want to fine tune it with the help of
“Error retry configuration”. For example, in case of events based services, a problem
with the availability of an infrastructure component can be given more time to be
remediated before retrying as opposed to lets say some temporary issue related to
concurrency. At the very least, infrastructure errors are worth retrying for higher duration.
There is no point halting the retry of the current event and consuming new events if the
basic infrastructure service unavailability itself will not allow to process those.
Raising Incidents
In the end when all retries have failed, there needs to be a way to escalate the error and
raise an incident whenever needed. There are cases where the problem can simply be
thrown back to the user through notifications and it is upon that user to resubmit the
needed request but that leads to a bad user experience if the problem was due to some
internal technical issue. This is especially true in case of events based architectures.
Asynchronous integration patterns usually make use of DLQ as another error handling
pattern. However, DLQ is only a temporary step in the overall process. Whether through
DLQ or by other means, if there is way to reliably escalate the error so that it leads to
creation of an incident / dispatch of an operational alert, that would be a desirable
outcome. How can we design such an integration with incident management system/
alert management system? Here are few options:
The first approach utilizes the logging feature which is available in all the applications
and the least resistant and assured path to reporting an error. When application is done
with all the retries and it is trying to escalate an error, we need to try ad give it the most
reliable path where there are less chances of error. Logging fits in well in that criteria.
However, we want to separate out these logs from all other error logs otherwise the
incident management system will be flooded with errors which may not be relevant. We
call these logs as “Error alerts” — Logging of these error alerts can be done by a
dedicated library/component whose job is to format the error alert with the required and
maximum amount of information and log it in the required format. An example would be:
{
“logType”: “ErrorAlert”,
“errorCode”: “subA.compA.DB.DatabaseA.Access_Error”,
“businessObjectId”: “234323”,
“businessObjectName”: “ACCOUNT”,
“InputDetails” : “<Input object/ event object>”,
“InputContext” : “ any context info with the input”,
“datetime”: “date time of the error”,
“errorDetails” : “Error trace”,
“..other info as needed”: “…”
}
These logs are read by a log aggregator (which would already be there due to log
monitoring stack most of the organizations employ). The log aggregator would route
these logs to a different component whose action is to read these log events, read a
configuration and raise incidents/ alerts as needed. There is again a DLQ handling
needed here if things go wrong which will require monitoring & addressing.

Creation of incident/ alert requires some configuration so that a meaningful & actionable
incident can be created. Below is an example of the configuration attributes needed.
This could be dependent on a specific indent management system employed by the
organization. This configuration could also drive different kinds of actions. Given that the
error codes follow a particular taxonomy across the organization, this could very well
become a central configuration if needed.

The second approach is similar but is based on DLQ.


Error alert dispatcher component writes to a DLQ instead of writing into logs. Everything
else remains quite similar.

Which approach is better?


Log based approach is more resilient from application point of view but there are few
shortcoming as well:
1. More moving parts/integrations before the error reaches the incident management
system. That will need to be handled.
2. Risk of logs loss — That’s something which should be checked. If there is such a risk,
then this approach is not a good one. In general, logs data criticality is not very high but
if we are using it for raising incidents, then it would be good to check if it would be good
to rely on that. In the implementation we went with, we realized that there was a risk of
loss of logs data at peak volumes and hence we had to discard this approach but that
may not be the case with all logging environments.
DLQ based approach has its own pros and cons:
1. The primary or possibly only con I see is the step of connecting to DLQ. Do we need
some kind of DLQ over DLQ on some other messaging system as a redundancy? That
chain could be endless. Depends on criticality of data.
2. Another con could be the number of message routers which will need to connect to
central bus for dispatching the error alerts, if we combine all the applications in the
organization. May be some kind of federation would be needed but that’s where the
solution starts getting little complex with additional chances of error.
3. Rest everything looks okay. There are less number of components to integrate
otherwise and with bus based integration, there is a higher reliability on the transmission
of error alerts events.

9. Real-world Integration Case Studies


What is integration testing?
Integration testing is a type of software testing that verifies the interaction between
different components or modules of a software application. It is performed after unit
testing and before system testing.
Integration testing ensures that individual modules, when combined and integrated with
each other, work correctly as a system and meet the specified business requirements. It
plays a crucial role in ensuring that applications are reliable, scalable, and perform as
expected in real-world scenarios.


Why do we need integration testing?
Here are a few reasons why integration testing is necessary:
 Identify defects: It helps identify defects when different components of the system are
interacting with each other. This reduces the cost and effort required to fix issues later on
in the development process.
 Validate system requirements: This type of testing also ensures that the system
requirements are met and the different components of the system work together as
expected.
 Ensure reliability and stability of the system: It helps ensure that the system is
reliable and stable under different scenarios and use cases. Furthermore, it can also
handle load and stress testing to ensure its performance under various conditions.
 Enhance end-user experience: It helps ensure that end-user experience is seamless,
with no disruptions or unexpected behavior due to poor integration between different
components of the system.
Integration testing case examples
Integration testing is a crucial aspect of software testing that involves testing the
interface links of different software components. Here are some real-world industry
applications of integration testing:
Healthcare Information System
A healthcare information system consists of several components, such as an electronic
medical record, a laboratory information system, a radiology information system, and a
customer relationship management solution. Integration testing ensures that the
components are correctly integrated and function as a unified system. The test results
helped identify and fix several critical issues, including data inconsistencies and
interoperability issues.
E-commerce Websites
E-commerce websites need to undergo integration testing since they use different
software components from different vendors. For instance, they use a third-party
payment gateway to facilitate payments. Testing ensures that different components are
working correctly as a single system. Testing helps in identifying and fixing several
issues, including payment failures, transaction errors, and security vulnerabilities.
Financial Management Software
A financial management software that comprises several modules, including accounting,
budgeting, and payroll needs to undergo integration testing. This ensures that there
should not be any issues, including data inconsistencies, calculation errors, and
reporting issues.
Real-world case study examples
Workday Integration with SAP
This is a real test case study about an Opkey client. Opkey’s client was using Workday
as HRMS and SAP for financial management. The Workday environment was not only
heavily integrated with SAP FICO, but the client also had to deal with Workday’s bi-
annual updates. All these things resulted in cumbersome and costly testing cycles. They
turned to Opkey to address these challenges.

Outlook Salesforce Integration
It allows you to connect your Microsoft Outlook email client to Salesforce CRM, enabling
you to access Salesforce data and functionality directly from within Outlook. This
integration can help you streamline your sales and marketing workflows by enabling you
to manage contacts, leads, opportunities, and cases, without having to switch between
multiple applications.

However, end-to-end testing is critical to ensure that both applications are serving the
business purpose. It is also needed to ensure the integrity of the data.
As you’re now aware of integration test case examples, let’s discuss why you need
automation.
Why do we want to automate our tests?
Manual integration testing is painfully slow and cumbersome. Furthermore, the testing
efforts will increase whenever new features are added. However, automated testing
speeds up the feedback process on changes as the tests can be executed quickly.
How to get started with integration test automation?
Here are the general steps to get started with integration test automation:
 Identify the integration points: Identify the components or systems that need to be
integrated and determine the interfaces between them. This will help you determine the
scope of your integration testing.
 Define the test scenarios: Define the test scenarios that need to be automated. These
scenarios should cover all possible combinations of inputs and outputs.
 Choose the automation tools: Choose the automation tools and technologies that are
suitable for your integration testing needs. This may include tools for test case
management, test automation, and defect tracking. Read our blog the top continuous
integration testing tools for effortless testing to know more.
 Create the test environment: Set up the test environment that closely resembles the
production environment. This will help you ensure that your test results are accurate and
reliable.
 Develop the test scripts: Develop the test scripts that automate your integration test
scenarios. This may involve programming skills and knowledge of the testing tools you
have chosen.
 Execute the tests: Execute the automated integration tests and verify the results. This
will help you identify any defects or issues that need to be addressed.
 Integrate with your CI/CD pipeline: Integrate your automated integration tests with
your continuous integration and delivery pipeline. This will help you ensure that your
tests are run automatically and regularly as part of your software development lifecycle.
What are automated integration testing challenges?
Automated integration testing can pose several challenges that need to be addressed in
order to ensure its effectiveness:
 Test environment setup: Automated integration testing requires a test environment that
closely resembles the production environment. Setting up and maintaining such an
environment can be challenging, especially for large and complex systems.
 Test data management: It requires a large amount of test data that closely resembles
the production data. This can be challenging when dealing with complex relationships or
dependencies between different components.
 Test script maintenance: It requires the creation and maintenance of test scripts that
cover all possible scenarios. This can be challenging when dealing with complex
systems like Oracle, Salesforce, or SAP that receive updates 4X, 3X, or 2X a year.
 Integration with other tools: Automated integration testing tools need to integrate with
other tools in the development and testing process, such as version control, continuous
integration, and defect tracking tools. This can be challenging when different tools use
different technologies or interfaces.
 Dependencies: It often involves testing the dependencies between different
components or subsystems. This can be challenging when dependencies are complex,
or when changes to one component or subsystem affect other components or
subsystems.
 Test coverage: It requires comprehensive test coverage to ensure that all possible
scenarios are tested. This can be challenging when dealing with large and complex
systems, and when there are time and resource constraints.
To address these challenges, you need a test automation platform:
 Zero code automation: Opt for a test automation platform that can be operated easily
by business users with minimum training and support. A no code test automation
platform helps them create automation scripts effortlessly.
 Self-healing capabilities: As stated above, maintaining test scripts can be a challenging
task, opt for a test automation tool that comes packed with self-healing capabilities to
heal up the scripts without requiring human efforts.
 Multiple technology support: Opt for a tool that supports multiple technologies so that
modules can be tested seamlessly regardless of technology.
 AI-powered risk based coverage: Select the tool that supports risk based coverage.
This will ensure business continuity whenever new updates, functionality, are rolled out.
 Native integration with test management tools: Opt for a test automation tool that
natively integrates with test management tools as it can make it easier to manage and
track test results, as well as to create and organize test cases. Additionally, an integrated
tool can help to streamline the testing process and reduce the likelihood of errors or
inconsistencies.
How can Opkey help in integration testing?
Finding the right test automation platform is a challenging job. Among some of the
available tools, we would like to recommend Opkey, because of the following reasons:
 Opkey is the industry’s leading end-to-end test automation platform.
 It supports 14+ ERPs including Oracle, SAP, Salesforce, Workday, and Dynamics 365
along with 150+ packaged applications.
 Opkey is a zero code platform that can be operated with minimum training. Furthermore,
it offers you AI-powered change impact assessment and comes packed with self-healing
capabilities to make test script maintenance effortless.
 Lastly, natively supports test management tools like JIRA, Jenkins, ALM, etc.

You might also like