0% found this document useful (0 votes)
16 views25 pages

MODULE 4 OOR Dbms (Merrin)

in dbms

Uploaded by

rhattarde12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views25 pages

MODULE 4 OOR Dbms (Merrin)

in dbms

Uploaded by

rhattarde12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

MODULE 5

Object-Oriented Database
The ODBMS which is an abbreviation for object-oriented database management system is the
data model in which data is stored in the form of objects, which are instances of classes. These
classes and objects together make an object-oriented data model.

Components of Object-Oriented Data Model:


The OODBMS is based on three major components, namely: Object structure, Object classes,
and Object identity.

A. Object Structure:
The structure of an object refers to the properties that an object is made up of. These properties
of an object are referred to as an attribute. Thus, an object is a real-world entity with certain
attributes that makes up the object structure. Also, an object encapsulates the data code into a
single unit which in turn provides data abstraction by hiding the implementation details from the
user.
The object structure is further composed of three types of components: Messages, Methods, and
Variables. These are explained below.

1. Messages –
A message provides an interface or acts as a communication medium between an object
and the outside world. A message can be of two types:
○ Read-only message: If the invoked method does not change the value of a
variable, then the invoking message is said to be a read-only message.
○ Update message: If the invoked method changes the value of a variable,
then the invoking message is said to be an update message.

2. Methods –
When a message is passed then the body of code that is executed is known as a method.
Whenever a method is executed, it returns a value as output. A method can be of two
types:
○ Read-only method: When the value of a variable is not affected by a
method, then it is known as the read-only method.
○ Update-method: When the value of a variable changes by a method, then
it is known as an update method.

3. Variables –
It stores the data of an object. The data stored in the variables makes the object
distinguishable from one another.
B. Object Classes:
An object which is a real-world entity is an instance of a class. Hence first we need to define a
class and then the objects are made which differ in the values they store but share the same class
definition. The objects in turn correspond to various messages and variables stored in them.

Example –
class CLERK

{ //variables
char name;
string address;
int id;
int salary;

//methods
char get_name();
string get_address();
int annual_salary();
};
In the above example, we can see, CLERK is a class that holds the object variables and
messages.
An OODBMS also supports inheritance in an extensive manner as in a database there may be
many classes with similar methods, variables and messages. Thus, the concept of the class
hierarchy is maintained to depict the similarities among various classes.
The concept of encapsulation that is the data or information hiding is also supported by an
object-oriented data model. And this data model also provides the facility of abstract data types
apart from the built-in data types like char, int, float. ADT’s are the user-defined data types that
hold the values within them and can also have methods attached to them.
Thus, OODBMS provides numerous facilities to its users, both built-in and user-defined. It
incorporates the properties of an object-oriented data model with a database management system,
and supports the concept of programming paradigms like classes and objects along with the
support for other concepts like encapsulation, inheritance, and the user-defined ADT’s (abstract
data types).
ODBMS stands for Object Database Management System. In ODBMS data is encapsulated
and represented in the form of objects. It relates the concept of object-oriented programming
with database systems. ODBMS grew out of research during the early 1970s as database support
for graph-structured objects. In comparison with RDBMS, where data is stored in tables with
rows and columns, ODBMS stores information as objects.

Characteristics
● Easy to link with programming language: The programming language and the
database schema use the same type definitions, so developers may not need to learn a
new database query language.
● No need for user defined keys: Object Database Management Systems have an
automatically generated OID associated with each of the objects.
● Easy modeling: ODBMS can easily model real-world objects, hence, are suitable for
applications with complex data.
● Can store non-textual data ODBMS can also store audio, video and image data.
Advantages
● Speed: Access to data can be faster because an object can be retrieved directly
without a search, by following pointers.
● Improved performance:These systems are most suitable for applications that use
object oriented programming.
● Extensibility:Unlike traditional RDBMS where the basic-data types are hardcoded,
when using ODBMS the user can encode any kind of data-structures to hold the data.
● Data consistency: When ODBMS is integrated with an object-based application,
there is much greater consistency between the database and the programming
language since both use the same model of representation for the data. This helps
avoid the impedance mismatch.
● Capability of handling variety of data: Unlike other database management systems,
ODBMS can also store nn textual data like-: images, videos and audios
Disadvantages:
● No universal standards: There are no universally agreed standards of operating
ODBMS This is the most significant drawback as the user is free to manipulate data
models as he wants which can be an issue when handling enormous amounts of data.
● No security features:Since use of ODBMS is very limited, there are not adequate
security features to store production-grade data.
● Exponential increase in complexity:ODBMS become very complex very fast. When
there is a lot of data and a lot of relations between data, managing and optimizing
ODBMS becomes difficult.
● Scalability: Unable to support large systems.
● Query optimization is challenging: Optimising ODBMS queries requires complete
information about the data like-: type and size of data. This compromises the
data-encapsulation feature that ODBMS had to offer.
Object Relational Model
An Object relational model is a combination of an Object oriented database model and a
Relational database model. So, it supports objects, classes, inheritance etc. just like Object
Oriented models and has support for data types, tabular structures etc. like Relational data
models.
One of the major goals of Object relational data model is to close the gap between relational
databases and the object oriented practises frequently used in many programming languages such
as C++, C#, Java etc.
Advantages of Object Relational model:
The advantages of the Object Relational model are −
● Inheritance
The Object Relational data model allows its users to inherit objects, tables etc. so that they can
extend their functionality. Inherited objects contain new attributes as well as the attributes that
were inherited.
● Complex Data Types
Complex data types can be formed using existing data types. This is useful in Object relational
data models as complex data types allow better manipulation of the data.
● Extensibility
The functionality of the system can be extended in the Object relational data model. This can be
achieved using complex data types as well as advanced concepts of object oriented models such
as inheritance.
Disadvantages of Object Relational model:
The object relational data model can get quite complicated and difficult to handle at times as it is
a combination of the Object oriented data model and Relational data model and utilizes the
functionalities of both of them.

Difference between RDBMS and OODBMS


RDBMS:
RDBMS stands for Relational Database Management System. It is a database management
system based on the relational model i.e. the data and relationships are represented by a
collection of interrelated tables. It is a DBMS that enables the user to create, update, administer
and interact with a relational database. RDBMS is the basis for SQL, and for all modern database
systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
OODBMS:
OODBMS stands for Object-Oriented Database Management System. It is a DBMS where data
is represented in the form of objects, as used in object-oriented programming. OODB
implements object-oriented concepts such as classes of objects, object identity, polymorphism,
encapsulation, and inheritance. An object-oriented database stores complex data as compared to
relational databases. Some examples of OODBMS are Versant Object Database, Objectivity/DB,
ObjectStore, Caché and ZODB.
Logical Database
A Logical Database is a special type of ABAP (Advance Business Application and
Programming) that is used to retrieve data from various tables and the data is interrelated to each
other. Also, a logical database provides a read-only view of Data.
Structure Of Logical Database:
A Logical database uses only a hierarchical structure of tables i.e. Data is organized in a
Tree-like Structure and the data is stored as records that are connected to each other through
edges (Links). Logical Database contains Open SQL statements which are used to read data from
the database. The logical database reads the program, stores them in the program if required, and
passes them line by line to the application program.

Features of Logical Database:


In this section, let us look at some features of a logical database:
● We can select only that type of Data that we need.
● Data Authentication is done in order to maintain security.
● Logical Database uses hierarchical Structure due to this data integrity is maintained.
Goal Of Logical Database:
The goal of Logical Database is to create well-structured tables that reflect the needs of the user.
The tables of the Logical database store data in a non-redundant manner and foreign keys will be
used in tables so that relationships among tables and entities will be supported.
Tasks Of Logical Database:
Below is some important task of Logical Database:
● With the help of the Logical database, we will read the same data from multiple
programs.
● A logical database defines the same user interface for multiple programs.
● Logical Database ensures the Authorization checks for the centralized sensitive
database.
● With the help of a Logical Database, Performance is improved. Like in Logical
Database we will use joins instead of multiple SELECT statements, which will
improve response time and this will increase the Performance of Logical Database.
Data View Of Logical Database:
Logical Database provides a particular view of Logical Database tables. A logical database is
appropriately used when the structure of the Database is Large. It is convenient to use flow i.e
● SELECT
● READ
● PROCESS
● DISPLAY
In order to work with databases efficiently. The data of the Logical Database is hierarchical in
nature. The tables are linked to each other in a Foreign Key relationship.
Diagrammatically, the Data View of Logical Database is shown as:
Points To Remember:
● Tables must have Foreign Key Relationships.
● A logical Database consists of logically related tables that are arranged in a
hierarchical manner used for reading or retrieving Data.
● Logical Database consist of three main elements:
○ Structure of Database
○ Selections of Data from Database
○ Database Program
● If we want to improve the access time on data, then we use VIEWS in the Logical
Database.
Example:
Suppose in a University or College, a HOD wants to get information about a specific student. So
for that, he firstly retrieves the data about its batch and Branch from a large amount of Data, and
he will easily get information about the required Student but didn’t alter the information about it.

Advantages Of Logical Database:


Let us look at some advantages of the logical database:
● In a Logical database, we can select meaningful data from a large amount of data.
● Logical Database consists of Central Authorization which checks for Database
Accesses is Authenticated or not.
● In this Coding, the part is less required to retrieve data from the database as compared
to Other Databases.
● Access performance of reading data from the hierarchical structure of the Database is
good.
● Easy to understand user interfaces.
● Logical Database firstly checks functions which further check that user input is
complete, correct, and plausible.
Disadvantages Of Logical Database:
This section shows the disadvantages of the logical database:
● Logical Database takes more time when the required data is at the last because if that
table is required at the lowest level then firstly all upper-level tables should be read
which takes more time and this slows down the performance.
● In Logical Database ENDGET command doesn’t exist due to this the code block
associated with an event ends with the next event statement.

Web databases
A web database is essentially a database that can be accessed from a local network or the internet
instead of one that has its data stored on a desktop or its attached storage. Used for both
professional and personal use, they are hosted on websites, and are software as service (SaaS)
products, which means that access is provided via a web browser.
One of the types of web databases that you may be more familiar with is a relational database.
Relational databases allow you to store data in groups (known as tables), through its ability to
link records together. It uses indexes and keys, which are added to data, to locate information
fields stored in the database, enabling you to retrieve information quickly.
To paint a picture, just think about when you shop online and want to have a look at a specific
product. Typing in keywords such as “black dress” enables all the black dresses stored on the
website to appear right on the very browser you are looking on, because the information “black”
and “dress” are stored in their database entries.
Some advantages of using a web database include:

1. Web database applications can be free or require payment, usually through monthly
subscriptions. Because of this, you pay for the amount you use. So whether your business
shrinks or expands, your needs can be accommodated by the amount of server space. You
also don’t have to fork out for the cost of installing an entire software program.
2. The information is accessible from almost any device. Having things stored in a cloud
means that it is not stuck to one computer. As long as you are granted access, you can
technically get a hold of the data from just about any compatible device.
3. Web database programs usually come with their own technical support team so your IT
department folks can focus on other pressing company matters.
4. It’s convenient: web databases allow users to update information so all you have to do is
to create simple web forms.

Data Organization
Web databases enable collected data to be organized and cataloged thoroughly within hundreds
of parameters. The Web database does not require advanced computer skills, and many database
software programs provide an easy "click-and-create" style with no complicated coding. Fill in
the fields and save each record. Organize the data however you choose, such as chronologically,
alphabetically or by a specific set of parameters.
Web Database Software
Web database software programs are found within desktop publishing programs, such as
Microsoft Office Access and OpenOffice Base. Other programs include the Webex WebOffice
database and FormLogix Web database. The most advanced software applications can set up data
collection forms, polls, feedback forms and present data analysis in real time.
Applicable Uses
Businesses both large and small can use Web databases to create website polls, feedback forms,
client or customer and inventory lists. Personal Web database use can range from storing
personal email accounts to a home inventory to personal website analytics. The Web database is
entirely customizable to an individual's or business's needs.

Securing the website-based database

Securing your website-based database is also of great importance, especially since hackers
access billions of organizational records every year. Protecting your systems isn’t a matter that’s
up for discussion; it’s a must.
Luckily, database management systems (DBMS) offer robust data encryption mechanisms. Top
of that list is the use of complex algorithms for encrypting files. This approach makes
information unreadable to unauthorized users. When you need access, it will decrypt the records
to make them readable.
Passwords and private keys are great alternatives for securing your web database. These usually
limit the people that can access the system. It ensures hackers have a rough time trying to
penetrate the website database.
A web application firewall (WAF) is another excellent option. It adds an extra layer of protection
to your systems. The set-up works effectively in filtering bots, spam, and DDoS attacks. The best
part – it’s available at an affordable cost from CDN providers.

Distributed Database System


A distributed database is basically a database that is not limited to one system, it is spread over
different sites, i.e, on multiple computers or over a network of computers. A distributed database
system is located on various sites that don’t share physical components. This may be required
when a particular database needs to be accessed by various users globally. It needs to be
managed such that for the users it looks like one single database.
Types:
1. Homogeneous Database:
In a homogeneous database, all different sites store the database identically. The operating
system, database management system, and the data structures used – all are the same at all sites.
Hence, they’re easy to manage.
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different schema and software
that can lead to problems in query processing and transactions. Also, a particular site might be
completely unaware of the other sites. Different computers may use a different operating system,
different database application. They may even use different data models for the database. Hence,
translations are required for different sites to communicate.

Distributed Data Storage :


There are 2 ways in which data can be stored on different sites. These are:
1. Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire
database is available at all sites, it is a fully redundant database. Hence, in replication, systems
maintain copies of data.
This is advantageous as it increases the availability of data at different sites. Also, now query
requests can be processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change
made at one site needs to be recorded at every site that relation is stored or else it may lead to
inconsistency. This is a lot of overhead. Also, concurrency control becomes way more complex
as concurrent access now needs to be checked over a number of sites.
2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of
the fragments is stored in different sites where they’re required. It must be made sure that the
fragments are such that they can be used to reconstruct the original relation (i.e, there isn’t any
loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.
Fragmentation of relations can be done in two ways:

● Horizontal fragmentation – Splitting by rows –


The relation is fragmented into groups of tuples so that each tuple is assigned to at
least one fragment.
● Vertical fragmentation – Splitting by columns –
The schema of the relation is divided into smaller schemas. Each fragment must
contain a common candidate key so as to ensure a lossless join.
In certain cases, an approach that is hybrid of fragmentation and replication is used.
Applications of Distributed Database:
● It is used in the Corporate Management Information System.
● It is used in multimedia applications.
● Used in Military’s control system, Hotel chains etc.
● It is also used in manufacturing control systems.

Advantages of Distributed database


Distributed databases basically provide us the advantages of distributed computing to the
database management domain. Basically, we can define a Distributed database as a collection of
multiple interrelated databases distributed over a computer network and a distributed database
management system as a software system that basically manages a distributed database while
making the distribution transparent to the user.
Distributed database management basically proposed for various reasons from organizational
decentralization and economical processing to greater autonomy. Some of these advantages are
as follows:
1. Management of data with different level of transparency –
Ideally, a database should be distribution transparent in the sense of hiding the details of where
each file is physically stored within the system. The following types of transparencies are
basically possible in the distributed database system:
● Network transparency:
This basically refers to the freedom for the user from the operational details of the
network. These are of two types: Location and naming transparency.
● Replication transparencies:
It basically made users unaware of the existence of copies as we know that copies of
data may be stored at multiple sites for better availability performance and reliability.
● Fragmentation transparency:
It basically made users unaware about the existence of fragments. It may be the
vertical fragment or horizontal fragmentation.

2. Increased Reliability and availability –


Reliability is basically defined as the probability that a system is running at a certain time
whereas Availability is defined as the probability that the system is continuously available during
a time interval. When the data and DBMS software are distributed over several sites one site may
fail while other sites continue to operate and we are not able to only access the data that exist at
the failed site and this basically leads to improvement in reliability and availability.
3. Easier Expansion –
In a distributed environment expansion of the system in terms of adding more data, increasing
database sizes, or adding more data, increasing database sizes or adding more processors is much
easier.
4. Improved Performance –
We can achieve interquery and intraquery parallelism by executing multiple queries at different
sites by breaking up a query into a number of subqueries that basically executes in parallel which
basically leads to improvement in performance.

Data Warehouse
Data Warehouse is a relational database management system (RDBMS) constructed to meet the
requirement of transaction processing systems. It can be loosely described as any centralized data
repository which can be queried for business benefits. It is a database that stores information
oriented to satisfy decision-making requests. It is a group of decision support technologies,
targets to enable the knowledge worker (executive, manager, and analyst) to make superior and
higher decisions. So, Data Warehousing supports architectures and tools for business executives
to systematically organize, understand and use their information to make strategic decisions.

The Data Warehouse environment contains an extraction, transportation, and loading (ETL)
solution, an online analytical processing (OLAP) engine, customer analysis tools, and other
applications that handle the process of gathering information and delivering it to business users.

What is a Data Warehouse?

A Data Warehouse (DW) is a relational database that is designed for query and analysis rather
than transaction processing. It includes historical data derived from transaction data from single
and multiple sources. A Data Warehouse provides integrated, enterprise-wide, historical data and
focuses on providing support for decision-makers for data modeling and analysis. A Data
Warehouse is a group of data specific to the entire organization, not only to a particular group of
users. It is not used for daily operations and transaction processing but used for making
decisions.

A Data Warehouse can be viewed as a data system with the following attributes:

● It is a database designed for investigative tasks, using data from various applications.
● It supports a relatively small number of clients with relatively long interactions.
● It includes current and historical data to provide a historical perspective of information.
● Its usage is read-intensive.
● It contains a few large tables.

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in


support of management's decisions."

Characteristics of Data Warehouse

Subject-Oriented

A data warehouse target on the modeling and analysis of data for decision-makers. Therefore,
data warehouses typically provide a concise and straightforward view around a particular
subject, such as customer, product, or sales, instead of the global organization's ongoing
operations. This is done by excluding data that are not useful concerning the subject and
including all data needed by the users to understand the subject.

Integrated

A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and
online transaction records. It requires performing data cleaning and integration during data
warehousing to ensure consistency in naming conventions, attributes types, etc., among different
data sources.
Time-Variant

Historical information is kept in a data warehouse. For example, one can retrieve files from 3
months, 6 months, 12 months, or even previous data from a data warehouse. These variations
with a transactions system, where often only the most current file is kept.

Non-Volatile

The data warehouse is a physically separate data storage, which is transformed from the source
operational RDBMS. The operational updates of data do not occur in the data warehouse, i.e.,
update, insert, and delete operations are not performed. It usually requires only two procedures in
data accessing: Initial loading of data and access to data. Therefore, the DW does not require
transaction processing, recovery, and concurrency capabilities, which allows for substantial
speedup of data retrieval. Non-Volatile defines that once entered into the warehouse, and data
should not change.

Goals of Data Warehousing

● To help reporting as well as analysis


● Maintain the organization's historical information
● Be the foundation for decision making.

Need for Data Warehouse

Data Warehouse is needed for the following reasons:

1)Business User: Business users require a data warehouse to view summarized data from the
past. Since these people are non-technical, the data may be presented to them in an elementary
form.
2) Store historical data: Data Warehouse is required to store the time variable data from the
past. This input is made to be used for various purposes.
3) Make strategic decisions: Some strategies may be depending upon the data in the data
warehouse. So, data warehouses contribute to making strategic decisions.
4) For data consistency and quality: Bringing the data from different sources at a
commonplace, the user can effectively undertake to bring uniformity and consistency in data.
5) High response time: Data warehouses have to be ready for somewhat unexpected loads and
types of queries, which demands a significant degree of flexibility and quick response time.

Benefits of Data Warehouse

1. Understand business trends and make better forecasting decisions.


2. Data Warehouses are designed to perform enormous amounts of data.
3. The structure of data warehouses is more accessible for end-users to navigate,
understand, and query.
4. Queries that would be complex in many normalized databases could be easier to build
and maintain in data warehouses.
5. Data warehousing is an efficient method to manage demand for lots of information from
lots of users.
6. Data warehousing provides the capabilities to analyze a large amount of historical data.

Components or Building Blocks of Data Warehouse

Architecture is the proper arrangement of the elements. We build a data warehouse with software
and hardware components. To suit the requirements of our organizations, we arrange these
buildings. We may want to boost up another part with extra tools and services. All of these
depend on our circumstances.

The figure shows the essential elements of a typical warehouse. We see the Source Data
component shown on the left. The Data staging element serves as the next building block. In the
middle, we see the Data Storage component that handles the data warehouses data. This element
not only stores and manages the data; it also keeps track of data using the metadata repository.
The Information Delivery component on the right consists of all the different ways of making the
information from the data warehouses available to the users.
Source Data Component
Source data coming into the data warehouses may be grouped into four broad categories:
Production Data: This type of data comes from the different operating systems of the enterprise.
Based on the data requirements in the data warehouse, we choose segments of the data from the
various operational modes.
Internal Data: In each organization, the client keeps their "private" spreadsheets, reports,
customer profiles, and sometimes even department databases. This is the internal data, part of
which could be useful in a data warehouse.
Archived Data: Operational systems are mainly intended to run the current business. In every
operational system, we periodically take the old data and store it in achieved files.
External Data: Most executives depend on information from external sources for a large
percentage of the information they use. They use statistics associated with their industry
produced by the external department.

Data Staging Component


After we have extracted data from various operational systems and external sources, we have to
prepare the files for storing in the data warehouse. The extracted data coming from several
different sources need to be changed, converted, and made ready in a format that is relevant to be
saved for querying and analysis.
We will now discuss the three primary functions that take place in the staging area.
1) Data Extraction: This method has to deal with numerous data sources. We have to employ
the appropriate techniques for each data source.
2) Data Transformation: As we know, data for a data warehouse comes from many different
sources. If data extraction for a data warehouse poses big challenges, data transformation
presents even significant challenges. We perform several individual tasks as part of data
transformation.
First, we clean the data extracted from each source. Cleaning may be the correction of
misspellings or may deal with providing default values for missing data elements, or elimination
of duplicates when we bring in the same data from various source systems. Standardization of
data components forms a large part of data transformation. Data transformation contains many
forms of combining pieces of data from different sources. We combine data from single source
records or related data parts from many source records. On the other hand, data transformation
also contains purging source data that is not useful and separating outsource records into new
combinations. Sorting and merging of data take place on a large scale in the data staging area.
When the data transformation function ends, we have a collection of integrated data that is
cleaned, standardized, and summarized.
3) Data Loading: Two distinct categories of tasks form data loading functions. When we
complete the structure and construction of the data warehouse and go live for the first time, we
do the initial loading of the information into the data warehouse storage. The initial load moves
high volumes of data using up a substantial amount of time.

Data Mining
Data mining is one of the most useful techniques that help entrepreneurs, researchers, and
individuals to extract valuable information from huge sets of data. Data mining is also called
Knowledge Discovery in Database (KDD). The knowledge discovery process includes Data
cleaning, Data integration, Data selection, Data transformation, Data mining, Pattern evaluation,
and Knowledge presentation.

Our Data mining tutorial includes all topics of Data mining such as applications, Data mining vs
Machine learning, Data mining tools, Social Media Data mining, Data mining techniques,
Clustering in data mining, Challenges in Data mining, etc.

The process of extracting information to identify patterns, trends, and useful data that would
allow the business to take the data-driven decision from huge sets of data is called Data Mining.

In other words, we can say that Data Mining is the process of investigating hidden patterns of
information to various perspectives for categorization into useful data, which is collected and
assembled in particular areas such as data warehouses, efficient analysis, data mining algorithm,
helping decision making and other data requirement to eventually cost-cutting and generating
revenue.
Data mining is the act of automatically searching for large stores of information to find trends
and patterns that go beyond simple analysis procedures. Data mining utilizes complex
mathematical algorithms for data segments and evaluates the probability of future events. Data
Mining is also called Knowledge Discovery of Data (KDD).

Data Mining is a process used by organizations to extract specific data from huge databases to
solve business problems. It primarily turns raw data into useful information.

Data Mining is similar to Data Science carried out by a person, in a specific situation, on a
particular data set, with an objective. This process includes various types of services such as text
mining, web mining, audio and video mining, pictorial data mining, and social media mining. It
is done through software that is simple or highly specific. By outsourcing data mining, all the
work can be done faster with low operation costs. Specialized firms can also use new
technologies to collect data that is impossible to locate manually. There is tons of information
available on various platforms, but very little knowledge is accessible. The biggest challenge is
to analyze the data to extract important information that can be used to solve a problem or for
company development. There are many powerful instruments and techniques available to mine
data and find better insight from it.

Types of Data Mining

Data mining can be performed on the following types of data:

Relational Database:

A relational database is a collection of multiple data sets formally organized by tables, records,
and columns from which data can be accessed in various ways without having to recognize the
database tables. Tables convey and share information, which facilitates data searchability,
reporting, and organization.
Data warehouses:

A Data Warehouse is the technology that collects the data from various sources within the
organization to provide meaningful business insights. The huge amount of data comes from
multiple places such as Marketing and Finance. The extracted data is utilized for analytical
purposes and helps in decision- making for a business organization. The data warehouse is
designed for the analysis of data rather than transaction processing.

Data Repositories:

The Data Repository generally refers to a destination for data storage. However, many IT
professionals utilize the term more clearly to refer to a specific kind of setup within an IT
structure. For example, a group of databases, where an organization has kept various kinds of
information.

Object-Relational Database:

A combination of an object-oriented database model and relational database model is called an


object-relational model. It supports Classes, Objects, Inheritance, etc. One of the primary
objectives of the Object-relational data model is to close the gap between the Relational database
and the object-oriented model practices frequently utilized in many programming languages, for
example, C++, Java, C#, and so on.

Transactional Database:

A transactional database refers to a database management system (DBMS) that has the potential
to undo a database transaction if it is not performed appropriately. Even though this was a unique
capability a very long while back, today, most of the relational database systems support
transactional database activities.

Advantages of Data Mining

● The Data Mining technique enables organizations to obtain knowledge-based data.


● Data mining enables organizations to make lucrative modifications in operation and
production.
● Compared with other statistical data applications, data mining is cost-efficient.
● Data Mining helps the decision-making process of an organization.
● It Facilitates the automated discovery of hidden patterns as well as the prediction of
trends and behaviors.
● It can be induced in the new system as well as the existing platforms.
● It is a quick process that makes it easy for new users to analyze enormous amounts of
data in a short time.
Disadvantages of Data Mining

● There is a probability that the organizations may sell useful data of customers to other
organizations for money. As per the report, American Express has sold credit card
purchases of their customers to other organizations.
● Many data mining analytics software is difficult to operate and needs advance training to
work on.
● Different data mining instruments operate in distinct ways due to the different algorithms
used in their design. Therefore, the selection of the right data mining tools is a very
challenging task.
● The data mining techniques are not precise, so that it may lead to severe consequences in
certain conditions.

Data Mining Applications

Data Mining is primarily used by organizations with intense consumer demands- Retail,
Communication, Financial, marketing company, determine price, consumer preferences, product
positioning, and impact on sales, customer satisfaction, and corporate profits. Data mining
enables a retailer to use point-of-sale records of customer purchases to develop products and
promotions that help the organization to attract the customer.
Challenges of Implementation in Data mining

Incomplete and noisy data:

The process of extracting useful data from large volumes of data is data mining. The data in the
real-world is heterogeneous, incomplete, and noisy. Data in huge quantities will usually be
inaccurate or unreliable. These problems may occur due to data measuring instrument or because
of human errors. Suppose a retail chain collects phone numbers of customers who spend more
than $ 500, and the accounting employees put the information into their system. The person may
make a digit mistake when entering the phone number, which results in incorrect data. Even
some customers may not be willing to disclose their phone numbers, which results in incomplete
data. The data could get changed due to human or system error. All these consequences (noisy
and incomplete data)makes data mining challenging.

Data Distribution:

Real-world's data is usually stored on various platforms in a distributed computing environment.


It might be in a database, individual systems, or even on the internet. Practically, It is a quite
tough task to make all the data to a centralized data repository mainly due to organizational and
technical concerns. For example, various regional offices may have their servers to store their
data. It is not feasible to store, all the data from all the offices on a central server. Therefore, data
mining requires the development of tools and algorithms that allow the mining of distributed
data.
Complex Data:

Real-world data is heterogeneous, and it could be multimedia data, including audio and video,
images, complex data, spatial data, time series, and so on. Managing these various types of data
and extracting useful information is a tough task. Most of the time, new technologies, new tools,
and methodologies would have to be refined to obtain specific information.

Performance:

The data mining system's performance relies primarily on the efficiency of algorithms and
techniques used. If the designed algorithm and techniques are not up to the mark, then the
efficiency of the data mining process will be affected adversely.

Data Privacy and Security:

Data mining usually leads to serious issues in terms of data security, governance, and privacy.
For example, if a retailer analyzes the details of the purchased items, then it reveals data about
buying habits and preferences of the customers without their permission.

Data Visualization:

In data mining, data visualization is a very important process because it is the primary method
that shows the output to the user in a presentable way. The extracted data should convey the
exact meaning of what it intends to express. But many times, representing the information to the
end-user in a precise and easy way is difficult. The input data and the output information being
complicated, very efficient, and successful data visualization processes need to be implemented
to make it successful.

9.7M

100

How to Dual Boot Ubuntu 18.04 and Windows 10 [2018]


Next

Sta

You might also like