Database System
Database System
Database System
Database
A database is used by an organization as an electronic way to store, manage and
retrieve information. The database has ability to organize, process and manage
information in a structured and controlled manner .
DBMS
DATABASE:
File orient approach and Database approach
Sharing of data Due to the centralized approach, Data is distributed in many files,
data sharing is easy. and it may be of different
formats, so it isn't easy to share
data.
Data Abstraction DBMS gives an abstract view of data The file system provides the
that hides the details. detail of the data representation
and storage of data.
Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.
Recovery DBMS provides a crash recovery The file system doesn't have a
Mechanism mechanism, i.e., DBMS protects the crash mechanism, i.e., if the
user from system failure. system crashes while entering
some data, then the content of
the file will be lost.
Manipulation DBMS contains a wide variety of The file system can't efficiently
Techniques sophisticated techniques to store store and retrieve the data.
and retrieve the data.
Concurrency DBMS takes care of Concurrent In the File system, concurrent
Problems access of data using some form of access has many problems like
locking. redirecting the file while deleting
some information or updating
some information.
Where to use Database approach used in large File system approach used in
systems which interrelate many large systems which interrelate
files. many files.
Cost The database system is expensive to The file system approach is
design. cheaper to design.
Data Due to the centralization of the In this, the files and application
Redundancy and database, the problems of data programs are created by different
Inconsistency redundancy and inconsistency are programmers so that there exists
controlled. a lot of duplication of data which
may lead to inconsistency.
Structure The database structure is complex to The file system approach has a
design. simple structure.
Data In this system, Data Independence In the File system approach, there
Independence exists, and it can be of two types. o exists no Data
Logical Data Independence o Independence.
Physical Data Independence
Data Models In the database approach, 3 types of In the file system approach, there
data models exist: is no concept of data models
exists.
o Hierarchal data models o
Network data models o
Relational data models
Flexibility Changes are often a necessity to the The flexibility of the system is less
content of the data stored in any as compared to the DBMS
system, and these changes are approach.
more easily with a database
approach.
Diagrammatic representation.
Simplicity in design.
Define data and their relations.
Useful for database designers.
Application independent.
Shared by different applications.
No duplication representation of data.
Consistency and structure validity.
Advantages
• It is simple and easily understandable by simple diagrams.
• Easily covert ER diagrams into the record-based data model.
• Easy to understand
Disadvantages
• No standard representations for ER diagram.
• flexible in the representation depends upon the designer.
• Related to high-level designs, cannot simplify at low level (like
coding).
B. Object-Oriented Data Models
• Represents real-world objects.
• Based on the collection of objects, attributes, and their relationship.
• Consider each object in the world as an object and isolate it from the
other.
• Use inherits, encapsulation, abstraction properties.
• Mainly used for the multimedia application as well as data with Complex
relationships.
Example:- In an Employee database we have different types of employees –
Engineer, Accountant, Manager, Clark. But all these employees belong to the
Person group. The person can have different attributes like name, address, age,
and phone.
All employees inherit the attributes and functionalities from Person, we can
reuse those features in Employee.
This feature of this model is called encapsulation.
Advantages
• Re-use the attributes and functionalities (Codes) (inheritance property).
• Reduces the overhead and maintenance costs and maintains the same
data multiple times.
• No fear of misuse by other objects.
• Easily add a new class, which is inherited from a parent class, and add
new features.
• More flexible.
• Each class binds its attributes and its functionality.
Disadvantages
• Not widely developed and complete to use in the database systems, so
not accepted by the users.
• It is an approach for solving the requirement, not a technology.
The diagram shows CLASS as the parent table and it has 2 child tables –
STUDENT and SUBJECT.
Characteristics
3. This model has two main concepts record and the parent-child
relationship.
Advantage
2. Integrity of data: - each child node can be linked with only one parent
and a child node can be accessed (read) by only its parents.
3. Data security: - in this model accessing, updating, and deleting the child
node with proper information of parent node. Otherwise not possible.
Disadvantage
3. Pointers required for defining data stored reference. For this technical
skills are required.
10. Data manipulation operations (like deletion and updation) are very
complex.
11. Not useful for any specific or big-size database design and modeling.
Only one occurrence for a particular record in the database can refer to
other records using the link for pointer so no multiple occurrences of
records.
3. Better performance
Records are accessed by the pointers so it is very easy to move from one
owner record to another.
5. Data integrity
6. Enforce standards
7. Data independency
Disadvantage
I. Complexity
The conceptual design of this model is simple but the design at the
hardware label is very complex because a large number of pointers are
required to show the relationship between the owner and member
records.
This model programmer works with links and defined how to traverse
them to get desired information so proper technical skills are required.
Terminology
Relation:- table.
Foreign key:- primary key of one table used in another table as a reference.
Advantage
1. Simplicity
Very easy and simple to design and implement at the logical level
database using appropriate attributes and data values in the tabular
format.
2. Flexible
3. Data independency
We can change the structure of the database without having any changes
without affecting the application.
4. Structural independency
This model is only concerned with data not with the structure so it
improves performance in the form of processing time and storage space. 5.
Query capability
This model belongs to high-level query language like SQL structure query
language to avoid Complex database navigation, not requiring any
pointers.
6. Mature Technology
Disadvantage
3. It has limited ability to deal with a binary large object such as images
spreadsheets audio videos etc.
4. This model increases hardware overheads which are costly.
8. for separate the physical data information from the logical data, more
powerful system hardware’s – memory is required. This makes the cost
of the database high.
Data model Schema and Instance o The data which is stored in the database at
a particular moment of time is called an instance of the database.
o The overall design of a database is called schema. o A database schema is
the skeleton structure of the database. It represents the logical view of
the entire database.
o A schema contains schema objects like table, foreign key, primary key,
views, columns, data types, stored procedure, etc. o A database schema
can be represented by using the visual diagram. That diagram shows the
database objects and relationship with each other.
o A database schema is designed by the database designers to help
programmers whose software will interact with the database. The process
of database creation is called data modeling.
A schema diagram can display only some aspects of a schema like the name of
record type, data type, and constraints. Other aspects can't be specified through
the schema diagram. For example, the given figure neither show the data type
of each data item nor the relationship among various files.
In the database, actual data changes quite frequently. For example, in the given
figure, the database changes whenever we add a new grade or add a student.
The data at a particular moment of time is called the instance of the database.
Components of DBMS
There are many components available in the DBMS. Each component has a
significant task in the DBMS. A database environment is a collection of
components that regulates the use of data, management, and a group of data.
These components consist of people, the technique of Handel the database,
data, hardware, software, etc. there are several components available for the
DBMS.
1. Hardware
o Here the hardware means the physical part of the DBMS. Here the hardware
includes output devices like a printer, monitor, etc., and storage devices like
a hard disk. o In DBMS, information hardware is the most important visible
part. The equipment which is used for the visibility of the data is the printer,
computer, scanner, etc. This equipment is used to capture the data and
present the output to the user. o With the help of hardware, the DBMS can
access and update the database. o The server can store a large amount of
data, which can be shared with the help of the user's own system.
o
The database can be run in any system that ranges from microcomputers
to mainframe computers. And this database also provides an interface
between the real worlds to the database. o When we try to run any database
software like MySQL, we can type any commands with the help of our
keyboards, and RAM, ROM, and processor are part of our computer system.
2. Software o Software is the main component of the DBMS. o Software is
defined as the collection of programs that are used to instruct the computer
about its work. The software consists of a set of procedures, programs, and
routines associated with the computer system's operation and performance.
Also, we can say that computer software is a set of instructions that is used to
instruct the computer hardware for the operation of the computers.
o The software includes so many software like network software and
operating software. The database software is used to access the database,
and the database application performs the task. o This software has the
ability to understand the database accessing language and then convert
these languages to real database commands and then execute the
database. o This is the main component as the total database operation
works on a software or application. We can also be called as database
software the wrapper of the whole physical database, which provides an
easy interface for the user to store, update and delete the data from the
database. o Some examples of DBMS software include MySQL, Oracle, SQL
Server, dBase, FileMaker, Clipper, Foxpro, Microsoft Access, etc.
3. Data o The term data means the collection of any raw fact stored in the
database. Here the data are any type of raw material from which meaningful
information is generated.
o The database can store any form of data, such as structural data,
nonstructural data, and logical data.
The structured data are highly specific in the database and have a
structured format. But in the case of non-structural data, it is a collection
of different types of data, and these data are stored in their native format.
o We also call the database the structure of the DBMS. With the help of the
database, we can create and construct the DBMS. After the creation of the
database, we can create, access, and update that database. o The main
reason behind discovering the database is to create and manage the data
o
within the database. o Data is the most important part of the DBMS. Here
the database contains the actual data and metadata. Here metadata
means data about data. o For example, when the user stores the data in a
database, some data, such as the size of the data, the name of the data,
and some data related to the user, are stored within the database. These
data are called metadata.
4. Procedures
o The procedure is a type of general instruction or guidelines for the use of
DBMS. This instruction includes how to set up the database, how to install
the database, how to log in and log out of the database, how to manage
the database, how to take a backup of the database, and how to generate
the report of the database.
o In DBMS, with the help of procedure, we can validate the data, control the
access and reduce the traffic between the server and the clients. The
DBMS can offer better performance to extensive or complex business logic
when the user follows all the procedures correctly.
o The main purpose of the procedure is to guide the user during the
management and operation of the database. o The procedure of the
databases is so similar to the function of the database. The major
difference between the database procedure and database function is that
the database function acts the same as the SQL statement. In contrast, the
database procedure is invoked using the CALL statement of the DBMS. o
Database procedures can be created in two ways in enterprise
architecture. These two ways are as below. o The individual object or the
default object.
The operations in a container.
2. <Datatype>,...)
3. IS
6. Execution section
7. EXCEPTION
8. Exception section
9. END
5. Database Access Language
o Database Access Language is a simple language that allows users to write
commands to perform the desired operations on the data that is stored in
the database.
o Database Access Language is a language used to write commands to
access, upsert, and delete data stored in a database. o Users can write
commands or query the database using Database Access Language before
submitting them to the database for execution.
o Through utilizing the language, users can create new databases and
tables, insert data and delete data. o Examples of database languages are
SQL (structured query language), My Access, Oracle, etc. A database
language is comprised of two languages.
1. Data Definition Language(DDL):It is used to construct a database. DDL
implements database schema at the physical, logical, and external levels.
The following commands serve as the base for all DDL commands:
o ALTER<object> o
COMMENT
o CREATE<object>
DESCRIBE<object>
o DROP<object> o
SHOW<object> o
USE<object>
o
2. Data Manipulation Language(DML): It is used to access a database. The
DML provides the statements to retrieve, modify, insert and delete the data
from the database.
The following commands serve as the base for all DML commands:
o INSERT
o UPDATE
o DELETE
o LOCK
o CALL
o EXPLAIN
o PLAN
6. People
o The people who control and manage the databases and perform different
types of operations on the database in the DBMS.
o The people include database administrator, software developer, and
Enduser.
o Database administrator-database administrator is the one who manages
the complete database management system. DBA takes care of the
security of the DBMS, its availability, managing the license keys, managing
user accounts and access, etc.
o Software developer- This user group is involved in developing and
designing the parts of DBMS. They can handle massive quantities of data,
modify and edit databases, design and develop new databases, and
troubleshoot database issues.
o End user - These days, all modern web or mobile applications store user
data. How do you think they do it? Yes, applications are programmed in
such a way that they collect user data and store the data on a DBMS
system running on their server. End users are the ones who store, retrieve,
update and delete data. o The users of the database can be classified into
different groups.
i. Native Users
v. Application Users
o The internal level has an internal schema which describes the physical
storage structure of the database.
o The internal schema is also known as a physical schema. o It uses the
physical data model. It is used to define that how the data will be stored
in a block.
o The physical level is used to describe complex low-level data structures in
detail.
The internal level is generally is concerned with the following activities: o Storage
space allocations.
For Example: B-Trees, Hashing etc.
o
2. Conceptual Level
Dictionary
A data dictionary contains metadata i.e data about the database. The data
dictionary is very important as it contains information such as what is in the
database, who is allowed to access it, where is the database physically stored
etc. The users of the database normally don't interact with the data dictionary,
it is only handled by the database administrators.
The data dictionary in general contains information about the following −
• Names of all the database tables and their schemas.
• Details about all the tables in the database, such as their owners,
their security constraints, when they were created etc.
• Physical information about the tables such as where they are stored
and how.
• Table constraints such as primary key attributes, foreign key information
etc.
• Information about the database views that are visible.
This is a data dictionary describing a table that contains employee details.
Database administration
Database administration is the function of managing and maintaining database
management systems (DBMS) software. Mainstream DBMS software such as
Oracle, IBM Db2 and Microsoft SQL Server need ongoing management. As such,
corporations that use DBMS software often hire specialized information
technology personnel called database administrators or DBAs. Responsibilities
• Installation, configuration and upgrading of Database server software and
related products.
• Evaluate Database features and Database related products.
• Establish and maintain sound backup and recovery policies and
procedures.
• Take care of the Database design and implementation.
• Implement and maintain database security (create and maintain users and
roles, assign privileges).
• Database tuning and performance monitoring.
• Application tuning and performance monitoring.
• Setup and maintain documentation and standards.
• Plan growth and changes (capacity planning).
• Work as part of a team and provide 24/7 support when required.
• Do general technical troubleshooting and give cons.
• Database recovery
1. Data Definition Language (DDL) o DDL stands for Data Definition Language. It
is used to define database structure or pattern. o It is used to create schema,
tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like
the number of tables and schemas, their names, indexes, columns in each
table, constraints, etc.
Here are some tasks that come under DDL:
o Create: It is used to create objects in the database. o Alter: It is used to
alter the structure of the database. o Drop: It is used to delete objects from
the database. o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that's why they come
under Data definition language.
2. Data Manipulation Language (DML)
DML stands for Data Manipulation Language. It is used for accessing and
manipulating data in a database. It handles user requests.
Most users accessed such systems via computer terminals that did not
have processing power and only provided display capabilities.
Web servers or e-mail servers also fall into the specialized server category.
The resources provided by specialized servers can be accessed by many
client machines.
The client machines provide the user with the appropriate interfaces to
utilize these servers, as well as with local processing power to run local
applications.
o Heterogeneity:
It speaks of the similarity or differences between the databases, system parts,
and data models.
Common Architecture Models of Distributed Database Systems: o
Client-Server Architecture of DDBMS:
This architecture is two level architecture where clients and servers are the
points or levels where the main functionality is divided. There is various
functionality provided by the server, like managing the transaction, managing
the data, processing the queries, and optimization. o Peer-to-peer Architecture
of DDBMS:
In this architecture, each node or peer is considered as a server as well as a client,
and it performs its database services as both (server and client). The peers
coordinate their efforts and share their resources with one another. o Multi
DBMS Architecture of DDBMS:
This is an amalgam of two or more independent Database Systems that functions
as a single integrated Database System.
Types of Distributed Database Systems:
This method involves redundantly storing the full relationship at two or more
locations. Since a complete database can be accessed from each site, it becomes
a redundant database. Systems preserve copies of the data as a result of
replication.
This has advantages because it makes more data accessible at many locations.
Additionally, query requests can now be handled in parallel.
However, there are some drawbacks as well. Data must be updated frequently.
Any changes performed at one site must be documented at every site where that
relation is stored in order to avoid inconsistent results. There is a tonne of
overhead here. Additionally, since concurrent access must now be monitored
across several sites, concurrency control becomes far more complicated. o
Fragmentation:
According to this method, the relationships are divided (i.e., broken up into
smaller pieces), and each fragment is stored at the many locations where it is
needed. To ensure there is no data loss, the pieces must be created in a way that
allows for the reconstruction of the original relation.
Since Fragmentation doesn't result in duplicate data, consistency is not a
concern.
Ways of fragmentation:
o Horizontal Fragmentation:
In Horizontal Fragmentation, the relational table or schema is broken down into
a group of one and more rows, and each row gets one fragment of the schema.
It is also called splitting by rows. o Vertical Fragmentation:
In this fragmentation, a relational table or schema is divided into some more
schemas of smaller sizes. A common candidate key must be present in each
fragment in order to guarantee a lossless join. This is also called splitting by
columns.
Note: Most of the time, a hybrid approach of replication and fragmentation is
used.
Application of Distributed Database Systems:
o Multimedia apps use it.
o The manufacturing control system also makes use of it. o Another
application is by corporate management for the information system.
o It is used in hotel chains, military command systems, etc.
2.Workgroup Database:
The Progress Workgroup RDBMS offers many of the same powerful capabilities
as the Enterprise RDBMS.
It is optimized for workgroups of 2 to 50 concurrent users and provides a
costeffective, department-level solution that includes high performance, multi-
user support, and cross-platform interoperability - at an excellent value.
It meets the needs of workgroup applications by running on a wide variety of
hardware and operating system platforms.
Because the flexible database architecture provides excellent throughput on all
platforms, a database deployed on one machine can serve applications on other
systems and network configurations.
3.Enterprise Database:
The OpenEdge Enterprise RDBMS is designed for large user environments and
the transaction processing throughput of today's most demanding on-line
transaction processing (OLTP) applications.
Grounded in a flexible, multithreaded, multiserver architecture, the Enterprise
RDBMS is a powerful, open and large-scale enterprise database that can run
across multiple hardware platforms and networks.
The Enterprise RDBMS includes all of the functionality needed to meet the most
demanding OLTP requirements.
These capabilities include row-level locking, roll-back and roll-forward recovery,
point-in-time recovery, distributed database management with two-phase
commit, integral support for fail-over cluster availability, a complete suite of
on-line utilities and support for the OpenEdge ABL as well as industry-standard
SQL.
The unique combination of power, flexibility and ease of operation makes the
Enterprise RDBMS an ideal engine for a wide range of commercial and data
processing applications.
Sophisticated self-tuning capabilities make the Enterprise RDBMS easier to
install, tune and manage than other products. With low administration costs,
low initial cost of licenses, minimum upgrade fees and limited software
implementation costs, the Enterprise RDBMS provides a significant cost-
ofownership advantage over competing databases.
Operating systems now have the ability to support data files larger than 2
gigabytes (as an OS configuration option). The OpenEdge Enterprise database
allows you to enable (up to terabyte-size) large files for the database, which
simplifies management of your operation since there are fewer files to manage.
The use of large files also permits increased maximum capacity for the
database.
-SPIN Ability to set the number of times a process retries to acquire a latch
before pausing. Uses the spin lock algorithm, which is very efficient when you
have multiple processors.
Concept of Data Warehouse
A data warehouse is a type of data management system that is designed to
enable and support business intelligence (BI) activities, especially analytics.
Data warehouses are solely intended to perform queries and analysis and often
contain large amounts of historical data. The data within a data warehouse is
usually derived from a wide range of sources such as application log files and
transaction applications.
A data warehouse centralizes and consolidates large amounts of data from
multiple sources. Its analytical capabilities allow organizations to derive
valuable business insights from their data to improve decision-making. Over
time, it builds a historical record that can be invaluable to data scientists and
business analysts. Because of these capabilities, a data warehouse can be
considered an organization’s “single source of truth.”
A typical data warehouse often includes the following elements:
A relational database to store and manage data
• An extraction, loading, and transformation (ELT) solution for preparing
the data for analysis
• Statistical analysis, reporting, and data mining capabilities
• Client analysis tools for visualizing and presenting data to business users
• Other, more sophisticated analytical applications that generate actionable
information by applying data science and artificial intelligence
(AI)algorithms, or graph and spatial features that enable more kinds of
analysis of data at scale
Benefits of a Data Warehouse
Data warehouses offer the overarching and unique benefit of allowing
organizations to analyze large amounts of variant data and extract significant
value from it, as well as to keep a historical record.
Four unique characteristics allow data warehouses to deliver this overarching
benefit. According to this definition, data warehouses are
• Subject-oriented. They can analyze data about a particular subject or
functional area (such as sales).
• Integrated. Data warehouses create consistency among different data
types from disparate sources.
• Nonvolatile. Once data is in a data warehouse, it’s stable and doesn’t
change.
• Time-variant. Data warehouse analysis looks at change over time.
A well-designed data warehouse will perform queries very quickly, deliver high
data throughput, and provide enough flexibility for end users to “slice and dice”
or reduce the volume of data for closer examination to meet a variety of
demands—whether at a high level or at a very fine, detailed level. The data
warehouse serves as the functional foundation for middleware BI
environments that provide end users with reports, dashboards, and other
interfaces.
Data Warehouse Architecture
The architecture of a data warehouse is determined by the organization’s specific
needs. Common architectures include
• Simple. All data warehouses share a basic design in which metadata,
summary data, and raw data are stored within the central repository of
the warehouse. The repository is fed by data sources on one end and
accessed by end users for analysis, reporting, and mining on the other
end.
• Simple with a staging area. Operational data must be cleaned and
processed before being put in the warehouse. Although this can be done
programmatically, many data warehouses add a staging area for data
before it enters the warehouse, to simplify data preparation.
• Hub and spoke. Adding data marts between the central repository and
end users allows an organization to customize its data warehouse to
serve various lines of business. When the data is ready for use, it is
moved to the appropriate data mart.
• Sandboxes. Sandboxes are private, secure, safe areas that allow
companies to quickly and informally explore new datasets or ways of
analyzing data without having to conform to or comply with the formal
rules and protocol of the data warehouse.
Concept of Data Mining
Data mining is the process of searching and analyzing a large batch of raw data
in order to identify patterns and extract useful information.
Companies use data mining software to learn more about their customers. It
can help them to develop more effective marketing strategies, increase sales,
and decrease costs. Data mining relies on effective data collection,
warehousing, and computer processing.
KEY TAKEAWAYS
• Data mining is the process of analyzing a large batch of information to
discern trends and patterns.
• Data mining can be used by corporations for everything from learning
about what customers are interested in or want to buy to fraud detection
and spam filtering.
• Data mining programs break down patterns and connections in data based
on what information users request or provide.
• Social media companies use data mining techniques to commodify their
users in order to generate profit.
• This use of data mining has come under criticism lately as users are often
unaware of the data mining happening with their personal information,
especially when it is used to influence preferences.
How Data Mining Works
Data mining involves exploring and analyzing large blocks of information to glean
meaningful patterns and trends. It is used in credit risk
management, fraud detection, and spam filtering. It also is a market research
tool that helps reveal the sentiment or opinions of a given group of people. The
data mining process breaks down into four steps:
• Data is collected and loaded into data warehouses on-site or on a cloud
service.
• Business analysts, management teams, and information technology
professionals access the data and determine how they want to organize it.
• Custom application software sorts and organizes the data.
• The end user presents the data in an easy-to-share format, such as a graph
or t
Data Mining Techniques
Data mining uses algorithms and various other techniques to convert large
collections of data into useful output. The most popular types of data mining
techniques include:
1. Clustering
Clustering is a technique used to represent data visually — such as in graphs
that show buying trends or sales demographics for a particular product.
Clustering refers to the process of grouping a series of different data points
based on their characteristics. By doing so, data miners can seamlessly divide
the data into subsets, allowing for more informed decisions in terms of broad
demographics (such as consumers or users) and their respective behaviors.
Different data mining processing models will have different steps, though the
general process is usually pretty similar. For example, the Knowledge Discovery
Databases model has nine steps, the CRISP-DM model has six steps, and the
SEMMA process model has five steps.1
Applications of Data Mining
In today's age of information, almost any department, industry, sector, or
company can make use of data mining.
Sales
Data mining encourages smarter, more efficient use of capital to drive revenue
growth. Consider the point-of-sale register at your favorite local coffee shop.
For every sale, that coffeehouse collects the time a purchase was made and
what products were sold. Using this information, the shop can strategically
craft its product line.
Marketing
Once the coffeehouse above knows its ideal line-up, it's time to implement the
changes. However, to make its marketing efforts more effective, the store can
use data mining to understand where its clients see ads, what demographics to
target, where to place digital ads, and what marketing strategies most resonate
with customers. This includes aligning marketing campaigns, promotional
offers, cross-sell offers, and programs to the findings of data mining.
Manufacturing
For companies that produce their own goods, data mining plays an integral part
in analyzing how much each raw material costs, what materials are being used
most efficiently, how time is spent along the manufacturing process, and what
bottlenecks negatively impact the process. Data mining helps ensure the flow
of goods is uninterrupted.
Fraud Detection
The heart of data mining is finding patterns, trends, and correlations that link
data points together. Therefore, a company can use data mining to identify
outliers or correlations that should not exist. For example, a company may
analyze its cash flow and find a reoccurring transaction to an unknown account.
If this is unexpected, the company may wish to investigate whether funds are
being mismanaged.
Human Resources
Human resources departments often have a wide range of data available for
processing including data on retention, promotions, salary ranges, company
benefits, use of those benefits, and employee satisfaction surveys. Data mining
can correlate this data to get a better understanding of why employees leave
and what entices new hires.
Customer Service
Customer satisfaction may be caused (or destroyed) for a variety of reasons.
Imagine a company that ships goods. A customer may be dissatisfied with
shipping times, shipping quality, or communications. The same customer may
be frustrated with long telephone wait times or slow e-mail responses. Data
mining gathers operational information about customer interactions and
summarizes the findings to pinpoint weak points and highlight what the
company is doing right.
Data warehouses:
A Data Warehouse is the technology that collects the data from various sources
within the organization to provide meaningful business insights. The huge
amount of data comes from multiple places such as Marketing and Finance.
The extracted data is utilized for analytical purposes and helps in decision-
making for a business organization. The data warehouse is designed for the
analysis of data rather than transaction processing.
Data Repositories:
The Data Repository generally refers to a destination for data storage. However,
many IT professionals utilize the term more clearly to refer to a specific kind of
setup within an IT structure. For example, a group of databases, where an
organization has kept various kinds of information.
Object-Relational Database:
A combination of an object-oriented database model and relational database
model is called an object-relational model. It supports Classes, Objects,
Inheritance, etc.
One of the primary objectives of the Object-relational data model is to close
the gap between the Relational database and the object-oriented model
practices frequently utilized in many programming languages, for example, C++,
Java, C#, and so on.
Transactional Database:
A transactional database refers to a database management system (DBMS) that
has the potential to undo a database transaction if it is not performed
appropriately. Even though this was a unique capability a very long while back,
today, most of the relational database systems support transactional data
Advantages of Data Mining o The Data Mining technique enables
organizations to obtain knowledgebased data. o Data mining enables
organizations to make lucrative modifications in operation and production.
o Compared with other statistical data applications, data mining is a
costefficient.
o Data Mining helps the decision-making process of an organization. o It
Facilitates the automated discovery of hidden patterns as well as the
prediction of trends and behaviors.
o It can be induced in the new system as well as the existing platforms. o
It is a quick process that makes it easy for new users to analyze
enormous amounts of data in a short time.
Disadvantages of Data Mining
o There is a probability that the organizations may sell useful data of
customers to other organizations for money. As per the report, American
Express has sold credit card purchases of their customers to other
organizations. o Many data mining analytics software is difficult to operate
and needs advance training to work on.
o Different data mining instruments operate in distinct ways due to the
different algorithms used in their design. Therefore, the selection of the
right data mining tools is a very challenging task. o The data mining
techniques are not precise, so that it may lead to severe consequences in
certain conditions.
Concept of Bigdata
Data which are very large in size is called Big Data. Normally we work on data of
size MB(WordDoc ,Excel) or maximum GB(Movies, Codes) but data in Peta
bytes i.e. 10^15 byte size is called Big Data. It is stated that almost 90% of
today's data has been generated in the past 3 years.
Sources of Big Data
These data come from many sources like o Social networking sites:
Facebook, Google, LinkedIn all these sites generates huge amount of
data on a day to day basis as they have billions of users worldwide.
o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge
amount of logs from which users buying trends can be traced.
o Weather Station: All the weather station and satellite gives very huge data
which are stored and manipulated to forecast weather.
o Telecom company: Telecom giants like Airtel, Vodafone study the user
trends and accordingly publish their plans and for this they store the data
of its million users.
o Share Market: Stock exchange across the world generates huge amount
of data through its daily transaction.
Big Data Characteristics
Big Data contains a large amount of data that is not being processed by
traditional data storage or the processing unit. It is used by many multinational
companies to process the data and business of many organizations. The data
flow would exceed 150 exabytes per day before replication.
There are five v's of Big Data that explains the characteristics.
5 V's of Big Data
o Volume o
Veracity o
Variety o
Value o
Velocity
Volume
The name Big Data itself is related to an enormous size. Big Data is a vast
'volumes' of data generated from many sources daily, such as business
processes, machines, social media platforms, networks, human interactions,
and many more.
Facebook can generate approximately a billion messages, 4.5 billion times that
the "Like" button is recorded, and more than 350 million new posts are
uploaded each day. Big data technologies can handle large amounts of data.
Variety
Big Data can be structured, unstructured, and semi-structured that are being
collected from different sources. Data will only be collected
from databases and sheets in the past, But these days the data will comes in
array forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.
The data is categorized as below:
c. Unstructured Data: All the unstructured files, log files, audio files, and
image files are included in the unstructured data. Some organizations have
much data available, but they did not know how to derive the value of data
since the data is raw.
Velocity
Velocity plays an important role compared to others. Velocity creates the speed
by which the data is created in real-time. It contains the linking of incoming
data sets speeds, rate of change, and activity bursts. The primary aspect of Big
Data is to provide demanding data rapidly.
Big data velocity deals with the speed at the data flows from sources like
application logs, business processes, networks, and social media sites,
sensors, mobile devices, etc.