0% found this document useful (0 votes)
6 views72 pages

Sample 2

The document outlines a project focused on optimizing SQL queries for a relational database containing employee information, emphasizing the use of metadata and performance analysis. It includes tasks such as writing basic queries, accessing metadata, analyzing query performance, optimizing queries, and evaluating the improvements made. Additionally, it discusses the creation of a distributed database with multiple tables, data insertion, and various SQL operations for data aggregation and combination.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views72 pages

Sample 2

The document outlines a project focused on optimizing SQL queries for a relational database containing employee information, emphasizing the use of metadata and performance analysis. It includes tasks such as writing basic queries, accessing metadata, analyzing query performance, optimizing queries, and evaluating the improvements made. Additionally, it discusses the creation of a distributed database with multiple tables, data insertion, and various SQL operations for data aggregation and combination.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 72

No.

: 1

QUERY OPTIMIZER

Write SQL queries to retrieve employee information and use metadata to optimize the
queries
PROBLEM STATEMENT:

Database Setup: Ensure you have a relational database with a table named Employees with
columns like EmployeeID, FirstName, LastName, Department, Position, Salary, and
HireDate.

Task 1: Write Basic Queries

 Write SQL queries to:


1. Retrieve all employee details.
2. Find employees in a specific department.
3. List employees hired after a specific date.
4. Retrieve employees sorted by salary in descending order.

Task 2: Access Metadata

 Use SQL commands to access metadata information:


1. Use DESCRIBE Employees; or SHOW COLUMNS FROM Employees; to get
the structure of the Employees table.
2. Use SHOW INDEXES FROM Employees; to view existing indexes on the
table.

Task 3: Analyze Query Performance

 Use the EXPLAIN command to analyze the performance of the queries you wrote in
Task 1.

Task 4: Optimize Queries

 Based on the metadata information and query performance analysis, modify your
queries or indexes to improve performance. For example:
1. Create an index on the Department column if it’s frequently queried.
2. Rewrite queries to utilize indexes more effectively.

Task 5: Evaluate Optimized Queries

 Re-run the EXPLAIN command on the optimized queries and compare the
performance with the initial queries. Document the improvements.

INTRODUCTION: "Query with optimizer by accessing the metadata" refers to the process
in which a database management system (DBMS) uses an internal component called a query
optimizer to determine the most efficient way to execute a database query. The optimizer
makes decisions based on metadata, which is data that provides information about the
structure and characteristics of the database.

Breaking Down the Terms:

1. Query:
o A request to retrieve or manipulate data stored in a database. For example, a
SQL query might request all records from a table where a certain condition is
met.
2. Optimizer:
o A software component within the DBMS that analyzes different ways to
execute a query. It considers various strategies or "execution plans" to find the
one that will complete the task most efficiently, often measured in terms of
time and resource usage (like CPU, memory, and I/O operations).
3. Metadata:
o Data about the data stored in the database. This includes information like:
 The size of tables.
 The number of rows in a table.
 The distribution of values within columns.
 The presence and type of indexes.
 Data types and constraints on columns.
o Metadata helps the optimizer understand the characteristics of the data, which
is crucial for making informed decisions about the best way to execute a
query.

OBJECTIVES:

To create a relational database table named Employees, insert values, and display
employee details using various SQL queries. Additionally, access metadata, analyze
query performance, optimize queries, and evaluate the performance improvements

APPLICATIONS:

 Relational Databases: SQL Query Optimization and Complex Query Handling.


 Data Warehousing: ETL Processes and Analytical Queries
 Big Data and NoSQL Systems: Query Optimization in Distributed Systems and
MapReduce Optimization:
 4. Business Intelligence (BI) Tools: Dashboard Performance
 E-commerce and Online Transaction Processing (OLTP)
 Cloud Databases and Database as a Service (DBaaS
 Content Management Systems (CMS)
 Internet of Things (IoT)
 Geospatial Databases

STEP-BY-STEP PROCESS:
Create the Employees Table and Insert Values

1. Create the Table


2. Insert Sample Data

Task 1: Write Basic Queries

1. Retrieve all employee details:


2. Find employees in a specific department:
3. List employees hired after a specific date:
4. Retrieve employees sorted by salary in descending order:

Task 2: Access Metadata

1. Get the structure of the Employees table:


2. View existing indexes on the table:

Task 3: Analyze Query Performance

Use the EXPLAIN command to understand how queries are executed:

1. Analyze retrieval of all employee details:


2. Analyze finding employees in a specific department:
3. Analyze listing employees hired after a specific date:
4. Analyze retrieving employees sorted by salary in descending order:

Task 4: Optimize Queries

1. Create an index on the Department column (if frequently queried):


2. Rewrite queries to utilize indexes more effectively:

Task 5: Evaluate Optimized Queries

Re-run the EXPLAIN command to assess if the optimization improved performance.

1. Analyze retrieval of all employee details (post-optimization):


2. Analyze finding employees in a specific department (post-optimization):
3. Analyze listing employees hired after a specific date (post-optimization):
4. Analyze retrieving employees sorted by salary in descending order (post-
optimization):

AIM:

The aim of this project is to optimize queries for an XML database that stores college
data, specifically focusing on retrieving student information efficiently.

ALGORITHMS:
 Create Table and to insert into the values
 Task 1: Write Basic Queries

Retrieve All Employee Details , Find Employees in a Specific Department , List


Employees Hired After a Specific Date, Retrieve Employees Sorted by Salary in
Descending Order

 Task 2: Access Metadata

Get the Structure of the Employees Table and View Existing Indexes on the Table

 Task 3: Analyze Query Performance

Analyze Performance of Queries Using EXPLAIN

For example, to analyze the performance of the query to retrieve all employee details:
 Task 4: Optimize Queries

1. Create an Index on the Department Column


2. Rewrite Queries to Utilize Indexes More Effectively
For example, if you frequently query employees by department, ensure the query uses
the index:

 Task 5: Evaluate Optimized Queries

1. Re-run the EXPLAIN Command on Optimized Queries

For the optimized query:

IMPLEMENTATION:

Create the Employees Table and Insert Values

1. Create the Table


CREATE TABLE Employees (
EmployeeID INT AUTO_INCREMENT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Department VARCHAR(50),
Position VARCHAR(50),
Salary DECIMAL(10, 2),
HireDate DATE
);

2. Insert Sample Data

INSERT INTO Employees (FirstName, LastName, Department, Position, Salary, HireDate)


VALUES
('John', 'Doe', 'IT', 'Software Engineer', 80000, '2022-01-15'),
('Jane', 'Smith', 'HR', 'HR Manager', 75000, '2021-03-22'),
('Emily', 'Jones', 'Finance', 'Accountant', 70000, '2023-05-10'),
('Michael', 'Brown', 'IT', 'System Analyst', 72000, '2022-07-30');

3. Display All Employee Details

SELECT * FROM Employees;

Task 1: Write Basic Queries

1. Retrieve all employee details:

SELECT * FROM Employees;

2. Find employees in a specific department:

Replace 'IT' with the desired department name.

SELECT * FROM Employees


WHERE Department = 'IT';

3. List employees hired after a specific date:

Replace '2022-01-01' with the date of interest.

SELECT * FROM Employees


WHERE HireDate > '2022-01-01';

4. Retrieve employees sorted by salary in descending order:

SELECT * FROM Employees


ORDER BY Salary DESC;

Task 2: Access Metadata


1. Get the structure of the Employees table:

DESCRIBE Employees;

or

SHOW COLUMNS FROM Employees;

2. View existing indexes on the table:

SHOW INDEXES FROM Employees;

Task 3: Analyze Query Performance

Use the EXPLAIN command to understand how queries are executed:

1. Analyze retrieval of all employee details:

EXPLAIN SELECT * FROM Employees;

2. Analyze finding employees in a specific department:

EXPLAIN SELECT * FROM Employees


WHERE Department = 'IT';

3. Analyze listing employees hired after a specific date:

EXPLAIN SELECT * FROM Employees


WHERE HireDate > '2022-01-01';

4. Analyze retrieving employees sorted by salary in descending order:

EXPLAIN SELECT * FROM Employees


ORDER BY Salary DESC;

Task 4: Optimize Queries

1. Create an index on the Department column (if frequently queried):

CREATE INDEX idx_department ON Employees(Department);

2. Rewrite queries to utilize indexes more effectively:

Ensure that the queries are designed to take advantage of indexes. For example, the index on
Department will be used if the query filters on this column:

EXPLAIN SELECT * FROM Employees


WHERE Department = 'IT';

Task 5: Evaluate Optimized Queries


Re-run the EXPLAIN command to assess if the optimization improved performance.

1. Analyze retrieval of all employee details (post-optimization):

EXPLAIN SELECT * FROM Employees;

2. Analyze finding employees in a specific department (post-optimization):

EXPLAIN SELECT * FROM Employees


WHERE Department = 'IT';

3. Analyze listing employees hired after a specific date (post-optimization):

EXPLAIN SELECT * FROM Employees


WHERE HireDate > '2022-01-01';

4. Analyze retrieving employees sorted by salary in descending order (post-


optimization):

EXPLAIN SELECT * FROM Employees


ORDER BY Salary DESC;

OUTPUT:
DESCRIBE Employees:

SHOW INDEXES FROM Employees:

EXPLAIN Output Examples:

RESULT: Thus to create to optimize queries for an XML database that stores college data,
specifically focusing on retrieving student information has been successfully executed.
Ex.No.: 2

DISTRIBUTED DATABASE

Create a distributed database and run various queries Use stored procedures

PROBLEM STATEMENT:

Table Creation and Data Insertion


Create Tables in Multiple Databases
o Task: Create three tables in each of two distributed databases (Database1 and
Database2). Each table should have the same structure but may contain
different subsets of data.
o Example Tables: Employees, Departments, and Salaries
Insert Data into Tables
 Task: Insert sample data into the Employees, Departments, and Salaries tables in both
databases.
Combine Data from Multiple Tables Using UNION
 Task: Write a query to combine data from the Employees tables in both databases
using the UNION operator. Include data from Employees, Departments, and Salaries
tables.
Handle Duplicate Records
 Task: Modify the query from Q3 to include duplicate records using UNION ALL.

Create a View to Aggregate Information


 Task: Create a view that aggregates employee information from all tables and
databases. For example, show the total salary and average salary by department.
Query the View
 Task: Write a query to retrieve data from the view created in Q5. Display the
department name, employee count, total salary, and average salary.

INTRODUCTION:

A distributed database is a collection of interrelated data that is distributed across multiple


locations, which can be spread across different physical sites or nodes within a network.
Unlike centralized databases where data resides in a single location, a distributed database
system (DDBS) manages data across different sites, ensuring consistency, reliability, and
efficient querying despite the physical distribution.

OBJECTIVES:
Manage tables across distributed databases, perform data insertion, and run various queries to
combine and aggregate data.

APPLICATIONS:

 E-commerce Platforms
 Financial Services
 Healthcare Systems
 Social Media Networks

STEP-BY-STEP PROCESS:

1. Create Tables in Multiple Databases

Step 1.1: Create the Employees Table in Both Databases


Step 1.2: Create the Departments Table in Both Databases
Step 1.3: Create the Salaries Table in Both Databases

2. Insert Data into Tables

Step 2.1: Insert Data into the Employees Table


Step 2.2: Insert Data into the Departments Table
Step 2.3: Insert Data into the Salaries Table

3. Combine Data from Multiple Tables Using UNION

To combine data from the Employees table in both databases, you need to use the UNION
operator. Assuming you have connected to both databases and can query them together:

4. Handle Duplicate Records Using UNION ALL

If you want to include duplicate records, replace UNION with UNION ALL:

5. Create a View to Aggregate Information

Now, let's create a view that aggregates employee information, showing total salary and
average salary by department:

6. Query the View

Finally, query the view to retrieve the desired data:


AIM: Create and manage tables in two distributed databases, insert data, combine data using
SQL operations, handle duplicates, and aggregate information.
ALGORITHMS:

Table Creation:

 Define table schemas for Employees, Departments, and Salaries.


 Create these tables in both Database1 and Database2.

Data Insertion:

 Insert sample data into each table in both databases.

Data Combination Using UNION:

 Write a SQL query to combine data from Employees tables in both databases using
UNION.

Handling Duplicates with UNION ALL:

 Modify the previous query to use UNION ALL to include duplicates.

View Creation for Aggregation:

 Create a view that aggregates employee data, such as total and average salaries by
department.

Query the View:

 Write a SQL query to retrieve aggregated data from the view.

IMPLEMENTATION:

1. Create Tables in Multiple Databases

Step 1.1: Create the Employees Table in Both Databases

Database1

CREATE TABLE Employees (


EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DepartmentID INT
);

Database2
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DepartmentID INT
);
Step 1.2: Create the Departments Table in Both Databases

Database1

CREATE TABLE Departments (


DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);

Database2

CREATE TABLE Departments (


DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(100)
);
Step 1.3: Create the Salaries Table in Both Databases

atabase1

CREATE TABLE Salaries (


EmployeeID INT PRIMARY KEY,
Salary DECIMAL(10, 2)
);

Database2

CREATE TABLE Salaries (


EmployeeID INT PRIMARY KEY,
Salary DECIMAL(10, 2)
);

2. Insert Data into Tables

Step 2.1: Insert Data into the Employees Table

Database1

INSERT INTO Employees (EmployeeID, FirstName, LastName, DepartmentID)


VALUES (1, 'John', 'Doe', 1), (2, 'Jane', 'Smith', 2);

Database2

INSERT INTO Employees (EmployeeID, FirstName, LastName, DepartmentID)


VALUES (3, 'Alice', 'Johnson', 1), (4, 'Bob', 'Brown', 3);
Step 2.2: Insert Data into the Departments Table

Database1

INSERT INTO Departments (DepartmentID, DepartmentName)


VALUES (1, 'HR'), (2, 'Finance');

Database2

INSERT INTO Departments (DepartmentID, DepartmentName)


VALUES (1, 'HR'), (3, 'IT');
Step 2.3: Insert Data into the Salaries Table

Database1

INSERT INTO Salaries (EmployeeID, Salary)


VALUES (1, 50000), (2, 60000);

Database2

INSERT INTO Salaries (EmployeeID, Salary)


VALUES (3, 70000), (4, 80000);

3. Combine Data from Multiple Tables Using UNION

To combine data from the Employees table in both databases, you need to use the UNION
operator. Assuming you have connected to both databases and can query them together:

-- Combining Employees data from both Database1 and Database2

SELECT EmployeeID, FirstName, LastName, DepartmentID FROM Database1.Employees


UNION
SELECT EmployeeID, FirstName, LastName, DepartmentID FROM Database2.Employees;

4. Handle Duplicate Records Using UNION ALL

If you want to include duplicate records, replace UNION with UNION ALL:

-- Combining Employees data from both Database1 and Database2, including duplicates

SELECT EmployeeID, FirstName, LastName, DepartmentID FROM Database1.Employees


UNION ALL
SELECT EmployeeID, FirstName, LastName, DepartmentID FROM Database2.Employees;

5. Create a View to Aggregate Information


Now, let's create a view that aggregates employee information, showing total salary and
average salary by department:

-- Create a view to aggregate employee information from both databases

CREATE VIEW EmployeeAggregates AS


SELECT D.DepartmentName, COUNT(E.EmployeeID) AS EmployeeCount,
SUM(S.Salary) AS TotalSalary, AVG(S.Salary) AS AvgSalary
FROM Database1.Employees E
JOIN Database1.Departments D ON E.DepartmentID = D.DepartmentID
JOIN Database1.Salaries S ON E.EmployeeID = S.EmployeeID
GROUP BY D.DepartmentName

UNION ALL

SELECT D.DepartmentName, COUNT(E.EmployeeID) AS EmployeeCount,


SUM(S.Salary) AS TotalSalary, AVG(S.Salary) AS AvgSalary
FROM Database2.Employees E
JOIN Database2.Departments D ON E.DepartmentID = D.DepartmentID
JOIN Database2.Salaries S ON E.EmployeeID = S.EmployeeID
GROUP BY D.DepartmentName;

6. Query the View

Finally, query the view to retrieve the desired data:

-- Querying the view to display department name, employee count, total salary, and average
salary

SELECT DepartmentName, EmployeeCount, TotalSalary, AvgSalary


FROM EmployeeAggregates;

OUTPUT:
Combined Data from Employees Table (UNION ALL)

View Query Output


RESULT:

Ex.No.: 3

OBJECT-ORIENTED DATABASE

Create OQL Queries to access the data from Object Oriented Database.

PROBLEM STATEMENT:
You are working with an object-oriented database designed to manage employee details for a
company. The database contains information about employees, departments, and projects.
You are required to write Object Query Language (OQL) queries to perform various data
retrieval tasks based on this database schema.

Assume the object-oriented database has the following classes:

1. Employee
o employeeID: Integer
o firstName: String
o lastName: String
o email: String
o jobTitle: String
o department: Department (Reference to a Department object)
o salary: Float
o hireDate: Date
o projects: List<Project> (Collection of Project objects)
2. Department
o departmentID: Integer
o departmentName: String
o manager: Employee (Reference to an Employee object)
3. Project
o projectID: Integer
o projectName: String
o startDate: Date
o endDate: Date
o teamMembers: List<Employee> (Collection of Employee objects)

Write an OQL query to retrieve all details of employees, including their ID, name, email,
job title, and salary.
Write an OQL query to list all employees who belong to a specific department, such as
"Engineering". Display their ID, name, and department name.
Write an OQL query to find all employees with a salary above $70,000. Display their ID,
name, and salary.
Write an OQL query to find all employees who are managed by a specific manager, say
the one with employeeID = 1002. Display their ID, name, and the manager's name.
Write an OQL query to find all employees involved in a project named "Project X".
Display their ID, name, and project name.
Write an OQL query to count the number of employees in each department. Display the
department name and the count of employees.
Write an OQL query to retrieve all employees who were hired after January 1, 2021.
Display their ID, name, and hire date.
Write an OQL query to list all projects and their respective team members. For each
project, display the project name and the names of all team members.

INTRODUCTION: Object Query Language (OQL) is a query language specifically


designed for Object-Oriented Database Management Systems (OODBMS). It is used to query
and manipulate objects stored in an OODBMS, much like how SQL is used in relational
databases. OQL allows you to query data while leveraging the object-oriented features such
as inheritance, encapsulation, and polymorphism.
Key Concepts of OQL

1. Object-Oriented Data Model:


o Objects: In OODBMS, data is represented as objects, similar to objects in
programming languages like Java or C++.
o Classes and Inheritance: Objects are instances of classes, and classes can
inherit properties and methods from other classes.
o Relationships: Objects can have relationships with other objects, enabling
complex data structures.
2. OQL Structure:
o OQL syntax is similar to SQL but is designed to work with objects instead of
relational tables.
o It supports operations like selection, projection, joins, and aggregation, but
these operations are performed on objects and their properties.

Basic OQL Query Structure

An OQL query typically consists of the following components:

 SELECT Clause: Specifies the attributes or objects to retrieve.


 FROM Clause: Specifies the classes from which to retrieve objects.
 WHERE Clause: Specifies the conditions for selecting objects.

OBJECTIVES:

APPLICATIONS:
APPLICATIONS:

 Computer-Aided Design (CAD) and Computer-Aided Manufacturing (CAM)


Systems.
 Geographic Information Systems (GIS).
 Telecommunications Network Management.
 Medical Imaging and Healthcare Systems.
 Enterprise Resource Planning (ERP) Systems.
 Scientific Research and Simulations.

STEP-BY-STEP EXPLANATION:

Setup and Configuration:


 Install the OODB system.
 Configure the system across multiple nodes.
 Ensure synchronization and communication between nodes.
Database Creation and Connection:
 Create a database instance.
 Set up and configure connections between distributed nodes.
Define Classes and Relationships:
 Create Employee and Department classes.
 Define properties and relationships between classes.
Insert Records:
 Add records to Employee and Department classes.
Data Retrieval:
 Query for all employees.
 Query employees by department.
 Query employees by salary threshold.
 Query employees using relationships.
 Query employees sorted by salary.
 Query department-wise employee count.
Update Records:
 Increase salaries for employees in a specific department.
 Change an employee's department.
Delete Records:
 Delete a specific employee record.
 Delete all employees from a department.

AIM: To write OQL queries that retrieve various data from an object-oriented database
containing information about employees, departments, and projects.

ALGORITHMS:

1. Retrieve All Details of Employees : Select all attributes from the Employee class.

2. List Employees in a Specific Department ("Engineering"): Filter employees by


departmentName in the Department class.

3. Find Employees with Salary Above $70,000: Filter employees by salary.

4. Find Employees Managed by a Specific Manager (employeeID = 1002): Join Employee


with Department and filter by manager ID.

5. Find Employees Involved in a Project Named "Project X": Join Employee with Project
and filter by project name.
6. Count the Number of Employees in Each Department: Group by department name and
count employees.

7. Retrieve Employees Hired After January 1, 2021: Filter employees by hire date.

8. List Projects and Their Respective Team Members: Join Project with Employee to get
team members.

9. Find Employees Not Assigned to Any Project: Filter employees with no associated
projects.

10. List Departments with Their Managers: Fetch department details and join with
manager information.

IMPLEMENTATION:

1. Retrieve All Details of Employees :

SELECT e.employeeID, e.firstName, e.lastName, e.email, e.jobTitle, e.salary FROM


Employee e

2. List Employees in a Specific Department ("Engineering"):

SELECT e.employeeID, e.firstName, e.lastName, e.department.departmentName

FROM Employee e

WHERE e.department.departmentName = 'Engineering'

3. Find Employees with Salary Above $70,000:

SELECT e.employeeID, e.firstName, e.lastName, e.salary


FROM Employee e
WHERE e.salary > 70000

4. Find Employees Managed by a Specific Manager (employeeID = 1002):

SELECT e.employeeID, e.firstName, e.lastName, e.department.manager.firstName + ' ' +


e.department.manager.lastName AS managerName
FROM Employee e
WHERE e.department.manager.employeeID = 1002

5. Find Employees Involved in a Project Named "Project X":

SELECT e.employeeID, e.firstName, e.lastName, p.projectName


FROM Employee e
JOIN e.projects p
WHERE p.projectName = 'Project X'

6. Count the Number of Employees in Each Department:

SELECT e.department.departmentName, COUNT(e.employeeID) AS employeeCount


FROM Employee e
GROUP BY e.department.departmentName

7. Retrieve Employees Hired After January 1, 2021:

SELECT e.employeeID, e.firstName, e.lastName, e.hireDate


FROM Employee e
WHERE e.hireDate > DATE('2021-01-01')

8. List Projects and Their Respective Team Members:

SELECT p.projectName, e.firstName, e.lastName


FROM Project p
JOIN p.teamMembers e

9. Find Employees Not Assigned to Any Project

SELECT e.employeeID, e.firstName, e.lastName, e.email


FROM Employee e
WHERE NOT EXISTS (
SELECT 1
FROM e.projects p
)

10. List Departments with Their Managers:

SELECT d.departmentName, d.manager.firstName + ' ' + d.manager.lastName AS


managerName

FROM Department d

OUTPUT:
1. Retrieve All Details of Employees :

2. List Employees in a Specific Department ("Engineering"):

3. Find Employees with Salary Above $70,000:

4. Find Employees Managed by a Specific Manager (employeeID = 1002):

5. Find Employees Involved in a Project Named "Project X":

6. Count the Number of Employees in Each Department:

7. Retrieve Employees Hired After January 1, 2021:


8. List Projects and Their Respective Team Members:

9. Find Employees Not Assigned to Any Project:

10. List Departments with Their Managers:

RESULT: Thus To write OQL queries that retrieve various data from an object-oriented
database containing information about employees, departments, and projects has been
successfully executed
Ex.No.: 4

PARALLEL DATABASE

Access database from a programming language such as Python

PROBLEM STATEMENT:

1. Install and configure a parallel database system and Java development environment.
(Ex. Apache HBase)
2. Write Java code to establish a connection to the parallel database.
3. Execute basic SQL queries against the parallel database
o Write Java code INSERT, to execute SELECT, UPDATE, and DELETE
queries.
o Use PreparedStatement for executing parameterized queries.
o Display the results of SELECT queries in the console.

4. Perform parallel data operations in the database.


o Write Java code to perform operations on the database that can leverage
parallelism (e.g., batch inserts or parallel queries).
o Use Java's concurrency utilities (e.g., ExecutorService) to manage parallel
tasks.
o Measure and compare the performance of parallel operations versus sequential
operations.
5. Optimize database queries for performance.

INTRODUCTION:

Parallel databases are designed to handle large volumes of data by distributing the workload
across multiple processors or servers. This architecture allows for faster query processing,
improved scalability, and better fault tolerance compared to traditional single-node databases.

Key Concepts in Parallel Databases

1. Data Partitioning: Data is split across multiple disks or nodes. Common partitioning
methods include:
o Horizontal Partitioning: Dividing rows across different nodes.
o Vertical Partitioning: Dividing columns across different nodes.
o Hash Partitioning: Distributing data based on a hash function.
o Range Partitioning: Dividing data based on a range of values.
2. Parallel Query Execution: Queries are processed simultaneously by different
processors or nodes. Techniques include:
o Intra-query parallelism: Breaking a single query into sub-tasks that run in
parallel.
o Inter-query parallelism: Running multiple queries simultaneously across
different processors.
3. Load Balancing: Ensuring that data and query processing is evenly distributed across
nodes to prevent bottlenecks.
4. Fault Tolerance: If one node fails, the system can continue processing using the
remaining nodes, often with data replication to ensure no data is lost.

OBJECTIVES:

To develop an application in Java that can efficiently interact with a parallel database system
to perform various data operations, including querying, updating, and managing data across
distributed nodes.

APPLICATION:

 Data Warehousing
 Big Data Analytics
 Scientific Research
 Financial Services
 Telecommunications
 Health Care

STEP-BY-STEP EXPLANATION:

1. Schema Design

2. Sample Data Insertion:

3. Parallel Setup
 Sharding: Distribute the tables across different nodes based on UnitID or other criteria
to balance the load.
 Replication: Implement replication to ensure fault tolerance and high availability.

4. Basic Queries

 Retrieve all personnel information, including their unit names and roles:

 Find all missions conducted by a specific unit, including details about the personnel
assigned to those missions and the equipment used:

 alculate the total number of missions conducted by each unit and the average duration
of these missions:

 Generate a report listing the top 5 units with the highest number of missions,
including the total number of missions and average mission duration for each unit:

Aim:

 To set up and configure Apache HBase as a parallel database system.


 To perform basic CRUD operations using Python.
 To leverage parallelism in data operations.
 To optimize database queries for performance.

Algorithm:

 Setup and Configuration: Install Apache HBase and configure it with Python
environment.
 Connection Establishment: Write Python code to connect to HBase.
 CRUD Operations: Implement Python code to perform INSERT, SELECT,
UPDATE, and DELETE operations.
 Parallel Operations: Use concurrency utilities to perform batch operations and
measure performance.
 Query Optimization: Apply optimization techniques to enhance query performance.

IMPLEMENTATIONS:
Install and Configure Apache HBase and Python Development Environment

1. Install Apache HBase:


o Download Apache HBase from the official website.
o Follow the installation instructions in the HBase Quick Start Guide.
o Configure hbase-site.xml with necessary settings (e.g., HDFS configuration,
zookeeper settings).
2. Install Python and Required Libraries:
o Ensure Python 3.x is installed. You can download it from Python’s website.
o Install the happybase library, which is a Python client for HBase:

bash
Copy code
pip install happybase

3. Set Up Python Development Environment:


o Use an IDE like PyCharm or VSCode.
o Ensure happybase and thrift are installed and configured to interact with
HBase.

3. Write Python Code to Establish Connection and Execute SQL Queries

1. Establish Connection:

python
Copy code
import happybase

connection = happybase.Connection('localhost', port=9090)


connection.open()
print("Connection established.")

2. Execute Basic SQL Queries:


o Insert Data:

python
Copy code
table_name = 'my_table'
table = connection.table(table_name)

# Insert data
table.put(b'row1', {b'cf1:col1': b'value1'})
print("Data inserted.")

o Select Data:

python
Copy code
# Fetch data
row = table.row(b'row1')
print("Retrieved value:", row[b'cf1:col1'].decode('utf-8'))

o Update Data: (Use put to overwrite existing data)

python
Copy code
# Update data
table.put(b'row1', {b'cf1:col1': b'new_value'})
print("Data updated.")

o Delete Data:

python
Copy code
# Delete data
table.delete(b'row1')
print("Data deleted.")

o Use Prepared Statements: Not applicable directly in HBase; parameterized


queries are handled in a different manner.
o Display Results:

python
Copy code
# Results are displayed using print statements in the above code.

4. Perform Parallel Data Operations

1. Batch Inserts:

python
Copy code
from concurrent.futures import ThreadPoolExecutor
import happybase

def insert_row(row_id):
table.put(f'row{row_id}'.encode(), {b'cf1:col1': f'value{row_id}'.encode()})

table_name = 'my_table'
table = connection.table(table_name)

with ThreadPoolExecutor(max_workers=4) as executor:


executor.map(insert_row, range(100))
print("Batch insert completed.")

2. Parallel Queries:

python
Copy code
from concurrent.futures import ThreadPoolExecutor
import happybase

def query_row(row_id):
row = table.row(f'row{row_id}'.encode())
return row.get(b'cf1:col1', b'No data').decode('utf-8')

table_name = 'my_table'
table = connection.table(table_name)

with ThreadPoolExecutor(max_workers=4) as executor:


results = list(executor.map(query_row, range(10)))
for result in results:
print("Query result:", result)

3. Measure and Compare Performance:

python
Copy code
import time

# Sequential Operation
start_time = time.time()
for i in range(100):
insert_row(i)
sequential_time = time.time() - start_time
print(f"Sequential execution time: {sequential_time:.2f} seconds")

# Parallel Operation
start_time = time.time()
with ThreadPoolExecutor(max_workers=4) as executor:
executor.map(insert_row, range(100))
parallel_time = time.time() - start_time
print(f"Parallel execution time: {parallel_time:.2f} seconds")

5. Optimize Database Queries for Performance

1. Use Efficient Row Keys: Design row keys to ensure even distribution of data.
2. Optimize Column Families: Minimize the number of column families and use them
effectively.
3. Tune HBase Configuration: Adjust settings for block cache, memstore, and other
parameters.
4. Monitor Performance: Utilize HBase metrics and monitoring tools to analyze
performance.

OUTPUT:
RESULT: Thus to design and implement a parallel database system to manage defense-
related information, including personnel, missions, and equipment has been successfully
executed
Ex.No.: 5

ACTIVE DATABASES

Create a Active database with facts and extract data using rules.

PROBLEM STATEMENT:
To create an active database that stores facts about a domain (e.g., a company’s employees,
departments, and projects) and to define rules that derive new facts or retrieve specific
information based on stored data.
i. Create and Test the facts
ii. Create and Test the Rules
iii. Create and Test the Complex Rules.
iv. Insert and Delete the facts in dynamically and Test the dynamic facts

INTRODUCTION:
An active database is not just a static repository of data; it also includes a set of rules that
automatically trigger actions when certain conditions are met. This type of database is highly
interactive, allowing the data to "work" for you by deriving new facts, enforcing constraints,
or automating tasks based on predefined rules.
What Are Facts and Rules?
 Facts represent the fundamental units of knowledge within the database. They are
assertions about the world that the database knows to be true. For example, in a
company database, facts might include information like "John is a manager" or "The
Sales department is located in New York." Facts are the data points from which more
complex queries and inferences can be drawn.
 Rules are logical statements that define how new information can be derived from the
existing facts. They encapsulate the logic of your domain, enabling the database to
infer new knowledge or respond to queries dynamically. For example, a rule might
state, "If someone is a manager, they are eligible for a promotion." When you query
the database, it will use this rule to determine which employees are eligible for
promotions based on the current facts.
Why Use an Active Database?
Active databases, particularly those implemented in logic programming languages like
Prolog, offer several advantages:
1. Inferred Knowledge: They can infer new knowledge from existing data, which
allows for more powerful queries and deeper insights.
2. Declarative Logic: The use of declarative rules makes it easier to express complex
relationships and business logic compared to traditional procedural code.
3. Dynamic Updates: The database can automatically respond to changes in the data,
dynamically updating the derived information without requiring manual intervention.
4. Simplified Querying: Complex queries that would be difficult to write in SQL can
often be expressed more naturally and succinctly using rules.

APPLICATIONS:

1. Real-Time Monitoring and Alerting


2. Business Process Automation
3. Enforcement of Business Rules
4. Auditing and Compliance
5. Dynamic Data Integration
6. Reactive User Interfaces

STEP-BY-STEP EXPLANATION:

i. Create and Test the Facts:


1. Schema Design:
o Create tables: Employees, Departments, Projects.
o Example schema:
 Employees (EmpID, Name, DeptID, ProjectID, Salary)
 Departments (DeptID, DeptName, ManagerID)
 Projects (ProjectID, ProjectName, Budget)
2. Insert Initial Facts:
o Insert sample data into the tables.
ii. Create and Test the Rules:
1. Define Simple Rules:
o Example rule: Automatically update department manager’s salary by 10% if
the department’s budget exceeds a certain amount.
o Implement the rule using triggers or stored procedures.
2. Test the Rules:
 Update a project’s budget and verify that the manager’s salary is updated.
iii. Create and Test the Complex Rules:
1. Define Complex Rules:
o Example complex rule: If a project’s budget is reduced by more than 20%,
reduce the salaries of all employees working on that project by 10%.
o Implement the rule with triggers or stored procedures.
2. Test the Complex Rules:
 Update a project’s budget and verify that the employees’ salaries are reduced
appropriately.
iv. Insert and Delete Facts Dynamically and Test the Dynamic Facts:
1. Dynamic Fact Insertion:
o Insert new employees, departments, or projects dynamically and verify rule
execution.
2. Dynamic Fact Deletion:
 Delete employees or projects and verify that rules handle these deletions correctly.
3. Testing Dynamic Changes:
 Insert or delete facts, then query the database to see the derived facts or changes.
 Ensure that rule evaluations are consistent with the current state of the database.

AIM:

To create an active database that stores facts about a company's domain (e.g., employees,
departments, and projects) and to define rules that derive new facts or retrieve specific
information based on stored data.

ALGORITHMS:

Design the Schema:

 Define the schema for the active database, including tables for employees,
departments, and projects.
 Identify key attributes and relationships among the tables.

Define and Store Facts:

 Populate the database with initial facts about employees, departments, and projects.
 Implement mechanisms to dynamically insert and delete facts.
Define Rules:

 Create rules to derive new facts or perform specific operations when certain
conditions are met.
 Ensure that rules are automatically triggered when related facts are inserted, updated,
or deleted.

Test Rules and Complex Rules:

 Validate the behavior of simple rules.


 Design and test complex rules that involve multiple conditions and actions.

Dynamic Fact Management:

 Insert and delete facts dynamically.


 Ensure that rule evaluations are performed correctly based on the changes.

IMPLEMENTATIONS:

i. Create and Test the Facts:

1. Schema Design:
o Create tables: Employees, Departments, Projects.
o Example schema:
 Employees (EmpID, Name, DeptID, ProjectID, Salary)
 Departments (DeptID, DeptName, ManagerID)
 Projects (ProjectID, ProjectName, Budget)
2. Insert Initial Facts:
o Insert sample data into the tables.

Example:

INSERT INTO Employees (EmpID, Name, DeptID, ProjectID, Salary)

VALUES (1, 'Alice', 101, 201, 70000);

o Verify the insertion of facts by querying the tables.

ii. Create and Test the Rules:

1. Define Simple Rules:


o Example rule: Automatically update department manager’s salary by 10% if
the department’s budget exceeds a certain amount.
o Implement the rule using triggers or stored procedures.

Example:

CREATE TRIGGER UpdateManagerSalary


AFTER UPDATE ON Projects

FOR EACH ROW

WHEN (NEW.Budget > 100000)

BEGIN

UPDATE Employees

SET Salary = Salary * 1.10

WHERE EmpID = (SELECT ManagerID FROM Departments WHERE DeptID =


NEW.DeptID);

END;

Test the Rules:

 Update a project’s budget and verify that the manager’s salary is updated.

Example

UPDATE Projects SET Budget = 120000 WHERE ProjectID = 201;

Create and Test the Complex Rules:

1. Define Complex Rules:


o Example complex rule: If a project’s budget is reduced by more than 20%,
reduce the salaries of all employees working on that project by 10%.
o Implement the rule with triggers or stored procedures.

Example:

CREATE TRIGGER ReduceEmployeeSalary

AFTER UPDATE ON Projects

FOR EACH ROW

WHEN ((OLD.Budget - NEW.Budget) / OLD.Budget) > 0.20

BEGIN

UPDATE Employees

SET Salary = Salary * 0.90

WHERE ProjectID = NEW.ProjectID;


END;

Test the Complex Rules:

 Update a project’s budget and verify that the employees’ salaries are reduced
appropriately.

Example

UPDATE Projects SET Budget = 80000 WHERE ProjectID = 201;

Insert and Delete Facts Dynamically and Test the Dynamic Facts:

1. Dynamic Fact Insertion:


o Insert new employees, departments, or projects dynamically and verify rule
execution.

Example

INSERT INTO Employees (EmpID, Name, DeptID, ProjectID, Salary)

VALUES (2, 'Bob', 101, 201, 60000);

Dynamic Fact Deletion:

 Delete employees or projects and verify that rules handle these deletions correctly.

Example

DELETE FROM Employees WHERE EmpID = 2;

Testing Dynamic Changes:

 Insert or delete facts, then query the database to see the derived facts or changes.
 Ensure that rule evaluations are consistent with the current state of the database.

OUTPUT:

Initial Insertion:
After Rule Trigger (Budget Update):

After Complex Rule Trigger (Budget Reduction):


Ex.No.: 6

DEDUCTIVE DATABASE

Create a knowledge database with facts and extract data using rules

Problem Statement: Design and implement a deductive database to store and retrieve
information about humans and their characteristics. The database should support storing
personal information, characteristics, and relationships, and allow for inferencing to derive
new facts based on predefined rules.

Tasks:

1. Database Setup and Schema Design

1. Design the Schema:


o Create a data log schema with the following predicates:
 human(ID, Name, Age): Represents a person with an identifier, name,
and age.
 characteristic(ID, Characteristic): Represents a characteristic
associated with a person.
 relationship(ID1, ID2, RelationshipType): Represents a relationship
between two individuals.

Sample Data Insertion:

 Populate the database with sample data. For example:

Implementing Inference Rules

1. Define Inference Rules:


o Implement rules to infer new facts. For example:
 Rule 1: Infer if two humans are related if one is a colleague of the
other

Rule 2: Infer if a person has a certain characteristic based on their age

Query the Database:

 Write queries to test the inference rules. For example:


o Query to find all people with the characteristic 'Experienced':

Query to find all related individuals based on the relationship 'Colleague':

Write complex queries combining multiple rules. For example:

 Query to find all friends who have a certain characteristic


Implement and test recursive rules to infer relationships or characteristics that involve
multiple steps. For example:

 Recursive Rule: Infer if two people are indirectly related through a common friend

INTRODUCTION:

A Knowledge Database (also known as a Deductive Database) extends the functionality of


traditional databases by incorporating reasoning capabilities. It stores facts and rules and can
infer new information from these through logical deduction. The primary query language
used in deductive databases is Datalog, a subset of Prolog.

STEP-BY-STEP PROCESS:

1. Understand the Basics: Facts and Rules

 Facts: Basic assertions about the world, similar to tuples in a relational database.
 Rules: Logical statements that define relationships between facts, allowing the
inference of new facts.

2. Choose a Deductive Database System

Several systems support Datalog or similar languages:

 XSB: A logic programming and deductive database system.


 SWI-Prolog: Prolog environment with support for deductive database features.
 LogicBlox: A platform for building and querying deductive databases.

For this guide, let's assume we're using SWI-Prolog.

3. Install and Set Up the Environment

1. Download and Install SWI-Prolog:


o Visit the SWI-Prolog website and download the installer for your operating
system.
o Follow the installation instructions.
2. Open SWI-Prolog:
o Launch the SWI-Prolog interactive environment.

4. Create the Knowledge Base (KB)

In Prolog, the knowledge base is a file that contains facts and rules. Create a file named
birds.pl:

 Birds: Sparrows, eagles, penguins, and ostriches.


 Characteristics: Which birds can fly, and general rules like all birds have feathers
and lay eggs.

5. Load the Knowledge Base into SWI-Prolog

1. Load the File:


o
2. Check for Errors:

Ensure there are no syntax errors in your KB.

APPLICATIONS:

1. Expert Systems

 Medical Diagnosis:
 Legal Reasoning:

2. Data Integration

 Semantic Data Integration:


 Ontology Management:

3. Business Rule Management

 Fraud Detection:
 Compliance and Policy Enforcement:

4. Natural Language Processing (NLP)

 Semantic Parsing:
 Information Extraction:

5. Artificial Intelligence and Machine Learning

 Reasoning in AI:
 Explanation Generation:

6. Semantic Web

 RDF and OWL Reasoning:


 Linked Data Querying:

6. Cognitive Computing.
7.
AIM:

To create a deductive database that stores and retrieves information about humans, their
characteristics, and their relationships, and to support inference to derive new facts from
predefined rules.

ALGORITHMS:

Algorithms

1. Database Schema Creation:


o Define predicates to represent the database schema.
o Create rules for inferencing based on the stored data.
2. Data Insertion:
o Insert sample data into the database.
3. Query Execution:
o Define queries to retrieve specific information.
o Implement recursive rules for complex inferencing.

IMPLEMENTATION:

Assuming you are using a Prolog-like logic programming language for the deductive
database, the implementation might look like this:

Data Log Schema


% Predicate Definitions
human(ID, Name, Age).
characteristic(ID, Characteristic).
relationship(ID1, ID2, RelationshipType).

Insertion the data:

% Sample Data
human(1, 'Alice', 30).
human(2, 'Bob', 25).
human(3, 'Charlie', 35).
human(4, 'Diana', 28).
human(5, 'Eve', 40).

characteristic(1, 'Experienced').
characteristic(2, 'Beginner').
characteristic(3, 'Experienced').
characteristic(4, 'Experienced').
characteristic(5, 'Senior').

relationship(1, 2, 'Colleague').
relationship(2, 3, 'Friend').
relationship(3, 4, 'Colleague').
relationship(4, 5, 'Friend').

Rules for Inferencing

1. Rule 1: Infer if two humans are related if one is a colleague of the other.

related(X, Y) :- relationship(X, Y, 'Colleague').


related(X, Y) :- relationship(X, Z, 'Colleague'), relationship(Z, Y, 'Colleague').

2. Rule 2: Infer if a person has a certain characteristic based on their age.

has_characteristic(ID, 'Experienced') :- human(ID, _, Age), Age > 30.

Queries

1. Query to find all people with the characteristic 'Experienced':

?- human(ID, Name, _), characteristic(ID, 'Experienced').

2. Query to find all related individuals based on the relationship 'Colleague':

?- related(ID1, ID2).

3. Query to find all friends who have a certain characteristic:

?- relationship(ID1, ID2, 'Friend'), characteristic(ID2, Characteristic), Characteristic =


'Experienced'.

Recursive Rule for Indirect Relationships

% Recursive Rule: Infer if two people are indirectly related through a common friend

indirectly_related(X, Y) :- relationship(X, Z, 'Friend'), relationship(Z, Y, 'Friend').

OUTPUT:

Queries and Results:

1. People with the characteristic 'Experienced':


2. Related individuals based on the relationship 'Colleague':

3. Friends who have the characteristic 'Experienced':

4. Indirectly related individuals through a common friend:

RESULT: Thus to create and query a deductive database using Datalog to manage and infer
information about birds and their characteristics, such as habitat, diet, and size has been
successfully executed
Ex.No.: 7

ETL TOOL

To extract data from your transactional system to create a consolidated data


warehouse or data mart for reporting and analysis.

To extract data from your transactional system to create a consolidated data


warehouse or data mart for reporting and analysis.

Problem Statement:

Enhance the ETL process to include data aggregation and enrichment. The goal is to extract,
transform, and load data, while also performing aggregations and adding additional
information to enrich the data set.

Tasks:

1. Source Data Preparation:


o Source 1: CSV file containing production data (same as Exercise 1).
o Source 2: SQL database containing sales data (same as Exercise 1).
2. Design Target Schema with Aggregation:
o Target Table 1: Production
o Target Table 2: Sales
o Target Table 3: MonthlySalesSummary
 Fields: ProductID, Month, TotalQuantitySold, TotalSaleAmount
3. ETL Tool Setup:
o Extract: Import data from sources.
o Transform:
 Aggregate sales data by month and product to populate
MonthlySalesSummary.
 Enrich production data with additional information, such as adding a
ProductCategory based on predefined rules.
o Load: Insert data into Production, Sales, and MonthlySalesSummary tables.
4. Validation:
o Verify the aggregation results in the MonthlySalesSummary table for
accuracy.

INTRODUCTION:
ETL (Extract, Transform, Load) tools are software applications used to manage the process
of extracting data from various sources, transforming it into a suitable format, and loading it
into a target database or data warehouse. ETL processes are fundamental in data integration,
allowing businesses to consolidate data from different sources for analysis, reporting, and
decision-making

Step-by-Step Guide to Enhance the ETL Process with Data Aggregation and
Enrichment

1. Source Data Preparation

Source 1: CSV File Containing Production Data

 Content: Information about products being manufactured, including ProductID,


ProductionDate, QuantityProduced, etc.
 Location: Stored in a file directory.

Source 2: SQL Database Containing Sales Data

 Content: Sales records, including ProductID, SaleDate, QuantitySold, SaleAmount,


etc.
 Location: Stored in a SQL database.

2. Design Target Schema with Aggregation

Target Table 1: Production

 Fields: ProductID, ProductionDate, QuantityProduced, ProductCategory

Target Table 2: Sales

 Fields: ProductID, SaleDate, QuantitySold, SaleAmount

Target Table 3: MonthlySalesSummary

 Fields: ProductID, Month, TotalQuantitySold, TotalSaleAmount


 Description: This table will store aggregated sales data, summarized by month and
product.

3. ETL Tool Setup

Let's assume you're using a common ETL tool like Talend, Apache NiFi, or Pentaho Data
Integration (PDI). The steps will be similar across tools, with differences mainly in the user
interface.

**1. Extract: Import Data from Sources

 Production Data (CSV) Import:


o Create an Input component to read the CSV file.
o Configure the path to the CSV file and define the schema (e.g., ProductID,
ProductionDate, QuantityProduced).
 Sales Data (SQL) Import:
o Create an Input component to connect to the SQL database.
o Write an SQL query to extract the necessary fields (e.g., ProductID, SaleDate,
QuantitySold, SaleAmount).

**2. Transform: Data Aggregation and Enrichment

 Step 1: Aggregate Sales Data by Month and Product


o Use an Aggregation component to group sales data by ProductID and Month
(extracted from SaleDate).
o Calculate TotalQuantitySold and TotalSaleAmount for each product in each
month.
 Step 2: Enrich Production Data
o Define rules to assign a ProductCategory based on the ProductID or other
attributes.
o Use a Lookup/Mapping component to add the ProductCategory field to the
production data.

**3. Load: Insert Data into Target Tables

 Load Production Data:


o Create an Output component to load the transformed production data into the
Production table in your target database.
 Load Sales Data:
o Create an Output component to load the sales data into the Sales table in your
target database.
 Load Monthly Sales Summary:
o Create an Output component to load the aggregated data into the
MonthlySalesSummary table.

4. Validation

 Step 1: Verify Aggregation Results


o Query the MonthlySalesSummary table in the target database.
o Compare the results with the original sales data to ensure the aggregation was
performed correctly.
 Step 2: Check Data Enrichment
o Verify that the ProductCategory has been correctly added to the production
data based on the predefined rules.

APPLICATIONS:

1. Data Warehousing

2. Business Intelligence and Reporting

3. Data Migration
4. Data Integration

5. Big Data Processing

6. Customer Relationship Management (CRM)

AIM: Enhance the ETL (Extract, Transform, Load) process to not only load data from
multiple sources but also perform aggregations and enrich the dataset with additional
information.

ALGORITHMS:

Source Data Preparation:

 Source 1: CSV file with production data.


 Source 2: SQL database with sales data.

Design Target Schema:

 Target Table 1: Production


o Fields: ProductID, ProductName, ProductCategory
 Target Table 2: Sales
o Fields: SaleID, ProductID, SaleDate, QuantitySold, SaleAmount
 Target Table 3: MonthlySalesSummary
o Fields: ProductID, Month, TotalQuantitySold, TotalSaleAmount

ETL Tool Setup:

 Extract:
o Import data from the CSV file and SQL database.
 Transform:
o Aggregation:
 Aggregate sales data from the SQL database by month and product to
calculate TotalQuantitySold and TotalSaleAmount.
o Enrichment:
 Enrich the production data by adding a ProductCategory based on
predefined rules or external data.
 Load:
o Insert the transformed and aggregated data into the Production, Sales, and
MonthlySalesSummary tables.

Validation:

 Verify that the data in the MonthlySalesSummary table is correct by checking


aggregation results and comparing them against raw sales data.

IMPLEMENTATION:
Extract Data:

 CSV Extraction:

import pandas as pd
# Load CSV data into DataFrame
production_df = pd.read_csv('production_data.csv')
 SQL Extraction

import pandas as pd
import sqlalchemy
# Connect to SQL database
engine = sqlalchemy.create_engine('mysql+pymysql://user:password@host/dbname')
sales_df = pd.read_sql('SELECT * FROM sales_data', engine)

Transform Data:

 Aggregation:

# Convert SaleDate to datetime


sales_df['SaleDate'] = pd.to_datetime(sales_df['SaleDate'])
sales_df['Month'] = sales_df['SaleDate'].dt.to_period('M')

# Aggregate sales data


monthly_summary = sales_df.groupby(['ProductID', 'Month']).agg(
TotalQuantitySold=pd.NamedAgg(column='QuantitySold', aggfunc='sum'),
TotalSaleAmount=pd.NamedAgg(column='SaleAmount', aggfunc='sum')
).reset_index()
 Enrichment:

# Define enrichment rules (example)


category_rules = {
'P001': 'Electronics',
'P002': 'Clothing',
# Add more rules as needed
}
production_df['ProductCategory'] = production_df['ProductID'].map(category_rules)

Load Data:

 Load Data to SQL Database

# Load Production and Sales data


production_df.to_sql('Production', engine, if_exists='replace', index=False)
sales_df.to_sql('Sales', engine, if_exists='replace', index=False)
monthly_summary.to_sql('MonthlySalesSummary', engine, if_exists='replace', index=False)

Validation:
 Check Aggregation Results:

# Query and validate MonthlySalesSummary


summary_check = pd.read_sql('SELECT * FROM MonthlySalesSummary', engine)
print(summary_check.head())

OUTPUT:

Sample Output

1. Production Table:

2. Sales Table:

3. MonthlySalesSummary Table:

RESULT: Thus to Enhance the ETL (Extract, Transform, Load) process to not only load
data from multiple sources but also perform aggregations and enrich the dataset with
additional information has been successfully executed
Ex.No.: 8

ORACLE DATABASE

Store and retrieve voluminous data using SanssouciDB / Oracle DB.

PROBLEM STATEMENT:

This lab simulates managing a large dataset of employee information in a company where
records are maintained separately for different departments (employee_department1 and
employee_department2). Due to organizational requirements, it is necessary to combine these
records and retrieve specific information based on various criteria, such as age or salary. The
goal is to demonstrate efficient data management and retrieval techniques using Oracle DB
when working with voluminous data.

 Create two tables to store employee information.


 Insert a large amount of data into these tables.
 Combine the data from both tables using a union operation.
 Retrieve and display the combined data.

INTRODUCTION:

Oracle Database is a robust and widely used relational database management system
(RDBMS) known for its scalability, performance, and security features. It supports a wide
range of data management tasks, including storing, retrieving, and manipulating large
volumes of data. Oracle's SQL language and advanced features like PL/SQL, partitioning,
and indexing make it a popular choice for enterprise applications.

STEP – BY – STEP PROCESS:

1. Setup Oracle Database Environment

Ensure you have access to an Oracle Database instance. You can use Oracle SQL Developer
or SQL*Plus for executing SQL queries.

2. Create Tables for Employee Information

You will create two tables to simulate storing employee records separately for different
departments.

3. Insert a Large Amount of Data into the Tables


To simulate a large dataset, you'll insert a significant number of records into both tables.

4. Combine Data from Both Tables Using a UNION Operation

The UNION operation will combine the records from both tables. The UNION operator
removes duplicates by default, but you can use UNION ALL if duplicates are not a concern.

5. Retrieve and Display the Combined Data

You can apply various SQL queries to retrieve specific information from the combined
dataset.

APPICATIONS:

1. Enterprise Resource Planning (ERP)

2. Customer Relationship Management (CRM)

3. Data Warehousing

4. E-commerce

5. Banking and Financial Services

6. Healthcare

7. Government and Public Sector

8. Telecommunications

AIM:

To demonstrate how to efficiently manage and retrieve a large dataset of employee


information stored in separate tables for different departments using Oracle DB.

ALGORITHM:

Create Tables:
 Create two tables, employee_department1 and employee_department2, to store
employee data separately for different departments.

Insert Data:

 Insert a large amount of sample employee data into these tables to simulate a real-
world scenario with voluminous data.

Display Sample Data:

 Retrieve and display a subset of the data from both tables to verify that the data has
been correctly inserted.

Combine Data:

 Use the SQL UNION operation to combine the data from both tables. This operation
eliminates duplicate records and provides a unified view of all employee data.

Retrieve and Display Combined Data:

 Retrieve and display the combined data, applying filters to demonstrate data retrieval
based on specific criteria such as age or salary.

Output for Specific Criteria:

 Retrieve and display data for employees aged between 30 and 40.
 Retrieve and display data for employees with a salary greater than 60,000.

IMPLEMENTATION:

Step 1: Create Tables

CREATE TABLE employee_department1 (


employee_id NUMBER PRIMARY KEY,
first_name VARCHAR2(50),
last_name VARCHAR2(50),
age NUMBER,
salary NUMBER,
department VARCHAR2(50)
);
CREATE TABLE employee_department2 (
employee_id NUMBER PRIMARY KEY,
first_name VARCHAR2(50),
last_name VARCHAR2(50),
age NUMBER,
salary NUMBER,
department VARCHAR2(50)
);
Step 2: Insert Data
To simulate a large dataset, we will insert a substantial number of records into each table. The
following PL/SQL block inserts 50,000 records into each table:
BEGIN
FOR i IN 1..50000 LOOP
INSERT INTO employee_department1 (employee_id, first_name, last_name, age,
salary, department)
VALUES (i, 'First' || i, 'Last' || i, MOD(i, 60) + 20, MOD(i, 10000) + 30000,
'Department1');

INSERT INTO employee_department2 (employee_id, first_name, last_name, age,


salary, department)
VALUES (i + 50000, 'First' || (i + 50000), 'Last' || (i + 50000), MOD(i + 50000, 60) +
20, MOD(i + 50000, 10000) + 30000, 'Department2');
END LOOP;
COMMIT;
END;
Step 3: Display Sample Data
Retrieve and display a small subset of the data from each table to verify that data insertion
was successful:
SELECT * FROM employee_department1 FETCH FIRST 10 ROWS ONLY;

SELECT * FROM employee_department2 FETCH FIRST 10 ROWS ONLY;


Step 4: Combine Data
Use the UNION operation to combine data from both tables. The UNION operation removes
duplicate rows, ensuring a unique combined dataset:
SELECT * FROM employee_department1
UNION
SELECT * FROM employee_department2;
Step 5: Retrieve and Display Combined Data
Retrieve and display combined data based on specific criteria. For example, to retrieve
employees older than 30 with a salary greater than 50,000:
SELECT * FROM (
SELECT * FROM employee_department1
UNION
SELECT * FROM employee_department2
)
WHERE age > 30 AND salary > 50000;

Retrieve Employees Aged Between 30 and 40:

SELECT * FROM (
SELECT * FROM employee_department1
UNION
SELECT * FROM employee_department2
) WHERE age

Retrieve Employees with Salary Greater than 60,000:

SELECT * FROM (
SELECT * FROM employee_department1
UNION
SELECT * FROM employee_department2
) WHERE salary > 60000;
OUTPUT:

Table Creation Confirmation:

 The tables employee_department1 and employee_department2 are successfully


created.

Sample Data Display:

 A subset of employee data from both employee_department1 and


employee_department2 tables is displayed, confirming correct data insertion.
Example output:

Combined Data Display:

 All unique employee records from both departments are displayed after performing
the UNION operation. Example output:

Filtered Data Display:

 A filtered set of combined data is displayed based on the specified criteria (e.g., age >
30 and salary > 50,000). Example output:
Output for Employees Aged Between 30 and 40:

Output for Employees with Salary Greater than 60,000:

RESULT: Thus to demonstrate how to efficiently manage and retrieve a large dataset of
employee information stored in separate tables for different departments using Oracle DB
has been successfully executed
Ex.No.: 9

NO SQL

Expose supermarket and genre information stored in Oracle NoSQL Database


and access NoSQL data from Oracle Database using SQL queries.

Problem Statement:

You are the database administrator for a manufacturing company that produces various
electronic components. The company wants to adopt a NoSQL database to handle its
dynamic and complex data requirements more efficiently, including inventory management,
employee information, and production tracking. Your task is to design a MongoDB database
that meets these requirements and perform various CRUD operations to demonstrate its
capabilities.

1. Understand the use of MongoDB in managing manufacturing data.


2. Learn to create and manipulate collections and documents in MongoDB.
3. Perform CRUD operations relevant to a manufacturing company scenario.
4. Use aggregation pipelines to analyze manufacturing data.
5. Apply indexing and schema design strategies to optimize queries.

INTRODUCTION:

NoSQL (Not Only SQL) databases are designed to handle large volumes of unstructured,
semi-structured, or structured data with high scalability and flexibility. Unlike traditional
relational databases that use SQL, NoSQL databases support various data models like key-
value, document, column-family, and graph. They are particularly useful in scenarios where
data needs to be distributed across many servers, or when the data model changes frequently,
such as in real-time analytics, content management, and Internet of Things (IoT) applications.

Oracle NoSQL Database is a distributed, highly scalable key-value database designed to


manage large amounts of unstructured and semi-structured data. It offers automatic sharding,
replication, and failover, providing high availability and fault tolerance. Oracle NoSQL
supports ACID transactions, and it can be accessed using APIs for Java, C, Python, and
REST.

STEP – BY – STEP PROCESS:

1. Setup Oracle NoSQL Database Environment

Before you begin, ensure you have access to an Oracle NoSQL Database instance. You can
run Oracle NoSQL on-premises, or you can use Oracle NoSQL Cloud Service.

2. Create and Populate NoSQL Tables


Let’s assume you have two datasets: one for supermarkets and another for genres of products
sold in those supermarkets.

 Supermarkets: Stores information about supermarket chains, including name,


location, and ratings.
 Genres: Stores information about different genres of products, such as “Fruits”,
“Dairy”, “Bakery”, etc.

a. Create Supermarkets Table

b. Create Genres Table

c. Insert Data into the Tables

You can insert records into these tables using Oracle NoSQL’s API. Below is an example in
Python:

3. Expose NoSQL Data to Oracle Database

Oracle Database supports integration with Oracle NoSQL Database, allowing you to expose
NoSQL data as relational tables. This can be achieved through Oracle SQL Access for
NoSQL.

4. Configure Oracle SQL Access for NoSQL


5. Install and configure Oracle SQL Access for NoSQL on your Oracle Database
instance.
6. Create an external table in Oracle Database that maps to the NoSQL table.

b. Create External Tables in Oracle Database

4. Access NoSQL Data Using SQL Queries

Once the external tables are created, you can run SQL queries on the NoSQL data as if it
were in Oracle Database.
APPICATIONS:

1. Big Data Analytics

2. Content Management Systems (CMS)

3. E-commerce

4. Social Media and Social Networks

5. Internet of Things (IoT)

6. Mobile Applications

AIM:

To leverage MongoDB to efficiently manage and analyze data related to inventory, employee
information, and production tracking.

ALGORITHM:

Database Design:

 Collections: Define collections for Inventory, Employees, and Production.


 Documents: Structure documents in each collection to capture relevant data.

CRUD Operations:

 Create: Insert new records into each collection.


 Read: Retrieve and query records from the collections.
 Update: Modify existing records.
 Delete: Remove records from the collections.

Aggregation Pipelines:

 Use aggregation to analyze and summarize data (e.g., total inventory value, employee
performance metrics).

Indexing and Schema Design:

 Implement indexes to optimize query performance.


Design schemas to handle evolving data requirements efficiently

IMPLEMENTATION:

Schema Design:

 Define collections for Products, Employees, and ProductionRecords.


 Design document structure for each collection based on requirements.

Products Collection:

{
"_id": ObjectId,
"productName": "string",
"category": "string",
"price": "number",
"stockQuantity": "number"
}
Employees Collection:

{
"_id": ObjectId,
"employeeName": "string",
"position": "string",
"hireDate": "ISODate",
"salary": "number"
}
ProductionRecords Collection:

{
"_id": ObjectId,
"productId": ObjectId,
"quantityProduced": "number",
"productionDate": "ISODate"
}

2. CRUD Operations

Create:
// Connect to MongoDB
const db = connect('mongodb://localhost:27017/manufacturing');

// Insert a new product


db.Products.insertOne({
productName: "Smartphone",
category: "Electronics",
price: 699.99,
stockQuantity: 150
});

// Insert a new employee


db.Employees.insertOne({
employeeName: "Alice Smith",
position: "Engineer",
hireDate: new Date("2024-01-15"),
salary: 85000
});

// Insert a new production record


db.ProductionRecords.insertOne({
productId: ObjectId("ProductObjectIdHere"),
quantityProduced: 500,
productionDate: new Date()
});
Read:

javascript
Copy code
// Find all products
db.Products.find({}).toArray();

// Find a specific employee


db.Employees.findOne({ employeeName: "Alice Smith" });
// Find production records for a specific product
db.ProductionRecords.find({ productId: ObjectId("ProductObjectIdHere") }).toArray();
Update:

// Update stock quantity of a product


db.Products.updateOne(
{ productName: "Smartphone" },
{ $set: { stockQuantity: 200 } }
);

// Update employee salary


db.Employees.updateOne(
{ employeeName: "Alice Smith" },
{ $set: { salary: 90000 } }
);
Delete:

// Delete a product
db.Products.deleteOne({ productName: "Smartphone" });

// Delete an employee
db.Employees.deleteOne({ employeeName: "Alice Smith" });

3. Aggregation Pipelines

Example: Total Quantity Produced by Product Category:

db.ProductionRecords.aggregate([
{
$lookup: {
from: "Products",
localField: "productId",
foreignField: "_id",
as: "productDetails"
}
},
{ $unwind: "$productDetails" },
{
$group: {
_id: "$productDetails.category",
totalQuantityProduced: { $sum: "$quantityProduced" }
}
}
]).toArray();
4. Indexing
Create Indexes:

// Create an index on productName for faster queries


db.Products.createIndex({ productName: 1 });

// Create an index on employeeName for faster queries


db.Employees.createIndex({ employeeName: 1 });

// Create an index on productId in ProductionRecords for faster joins


db.ProductionRecords.createIndex({ productId: 1 });

OUTPUT:

Read Operation - Products Collection:

Aggregation Output - Total Quantity Produced by Category:


RESULT: Thus To leverage MongoDB to efficiently manage and analyze data related
to inventory, employee information, and production tracking has been successfully executed.

Ex.No.: 10

INTEGRATING WEB DATABASE

Build Web applications using Java servlet API for storing data in databases that can be
queried using variant of SQL.

INTRODUCTION: Problem Statement:

You are hired to develop a web application for a company to manage its employee
information. The application should allow the administrative staff to add, view, update, and
delete employee records. This web application should be built using PHP for server-side
processing and MySQL for storing employee data.

1. Set up a PHP and MySQL development environment.


2. Connect PHP to a MySQL database to perform CRUD operations.
3. Build a user-friendly interface using HTML and PHP.
4. Implement form validation and user input sanitization.
5. Understand and implement basic security measures, including SQL injection
prevention.
An integrated database refers to a system where data is stored in a manner that allows
seamless integration with different software applications, particularly web applications. It
often involves using relational databases that can be queried using SQL (Structured Query
Language) or its variants. The integration allows developers to create, retrieve, update, and
delete (CRUD) operations on data through web applications, ensuring that the data remains
consistent and accessible.

In the context of web applications, Java Servlet API is commonly used to build dynamic web
applications that interact with databases. Java Servlets handle HTTP requests and responses,
while JDBC (Java Database Connectivity) allows Java applications to interact with relational
databases.

STEP-BY-STEP-PROCESS:

1. Setup Development Environment

Before you start, ensure you have the following tools and software installed:

 JDK (Java Development Kit): To compile and run Java programs.


 Apache Tomcat: A web server that supports Java Servlets.
 MySQL or PostgreSQL: A relational database for storing data.
 IDE (Integrated Development Environment): Such as Eclipse or IntelliJ IDEA.

2. Create a Database

First, create a database that your web application will interact with.

a. Create a Database and Table in MySQL

3. Set Up a Java Web Application

a. Create a Dynamic Web Project in Eclipse

1. Open Eclipse and go to File > New > Dynamic Web Project.
2. Enter the project name (e.g., UserManagementApp), select the target runtime (e.g.,
Apache Tomcat), and click Finish.

b. Add JDBC Driver to the Project

1. Download the JDBC driver for your database (e.g., MySQL Connector/J for MySQL).
2. Right-click on your project, go to Build Path > Configure Build Path, and add the
JDBC driver JAR file to the project’s classpath.

4. Configure Database Connection

Create a utility class to manage database connections.

a. Database Connection Utility

5. Develop Servlets to Handle HTTP Requests


a. Create a Servlet to Handle User Registration

b. Create a Servlet to Handle User Login

6. Create JSP Pages for User Interface

a. Registration Page (register.jsp)

b. Login Page (login.jsp)

7. Deploy and Run the Application

1. Deploy the Web Application: Right-click on the project and select Run As > Run on
Server.
2. Access the Application: Open a web browser and go to
http://localhost:8080/UserManagementApp/register.jsp to register a user, or
http://localhost:8080/UserManagementApp/login.jsp to log in.

APPLICATIONS:

1. E-commerce Platforms

2. Content Management Systems (CMS)

3. Customer Relationship Management (CRM) Systems

4. Healthcare Management Systems

5. Educational Platforms

6. Social Media Platforms

AIM:

This web application is to enable a company to manage employee information effectively.


The application will allow administrative staff to perform CRUD (Create, Read, Update,
Delete) operations on employee records, ensuring a smooth and efficient way to handle
employee data.

ALGORITHM:

Setup Environment
 Install a local server environment like XAMPP or WAMP that includes PHP and
MySQL.
 Create a MySQL database and table for storing employee information.

Connect PHP to MySQL

 Establish a connection to the MySQL database using PHP’s mysqli or PDO extension.
 Create functions for CRUD operations.

Build User Interface

 Design HTML forms for adding, viewing, updating, and deleting employee records.
 Use PHP to handle form submissions and display data.

Implement Form Validation and Sanitization

 Validate user input to ensure it meets expected formats and constraints.


 Sanitize input to prevent security vulnerabilities.

Basic Security Measures

 Implement prepared statements to prevent SQL injection.


 Use secure password hashing for user authentication if required.

IMPLEMENTATION:

Set Up PHP and MySQL Development Environment

1. Install XAMPP/WAMP:
o Download and install XAMPP (Windows, Linux) or WAMP (Windows) from
their official sites.
o Start Apache and MySQL services from the control panel.
2. Create Database and Table:

sql
Copy code
CREATE DATABASE company_db;
USE company_db;

CREATE TABLE employees (


id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
position VARCHAR(100),
department VARCHAR(100),
email VARCHAR(100) UNIQUE NOT NULL
);
2. Connect PHP to MySQL

db_connect.php

php
Copy code
<?php
$servername = "localhost";
$username = "root";
$password = "";
$database = "company_db";

// Create connection
$conn = new mysqli($servername, $username, $password, $database);

// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
?>

3. Build User Interface

index.php

<?php include 'db_connect.php'; ?>

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Employee Management</title>
</head>
<body>
<h1>Employee Management System</h1>

<h2>Add Employee</h2>
<form action="add_employee.php" method="post">
Name: <input type="text" name="name" required><br>
Position: <input type="text" name="position"><br>
Department: <input type="text" name="department"><br>
Email: <input type="email" name="email" required><br>
<input type="submit" value="Add Employee">
</form>

<h2>View Employees</h2>
<?php
$sql = "SELECT * FROM employees";
$result = $conn->query($sql);

if ($result->num_rows > 0) {
echo "<table
border='1'><tr><th>ID</th><th>Name</th><th>Position</th><th>Department</
th><th>Email</th><th>Actions</th></tr>";
while ($row = $result->fetch_assoc()) {
echo "<tr><td>{$row['id']}</td><td>{$row['name']}</td><td>{$row['position']}</
td><td>{$row['department']}</td><td>{$row['email']}</td>";
echo "<td><a href='update_employee.php?id={$row['id']}'>Update</a> | <a
href='delete_employee.php?id={$row['id']}'>Delete</a></td></tr>";
}
echo "</table>";
} else {
echo "No employees found.";
}
?>
</body>
</html>

add_employee.php

<?php
include 'db_connect.php';

$name = $conn->real_escape_string($_POST['name']);
$position = $conn->real_escape_string($_POST['position']);
$department = $conn->real_escape_string($_POST['department']);
$email = $conn->real_escape_string($_POST['email']);

$sql = "INSERT INTO employees (name, position, department, email) VALUES ('$name',
'$position', '$department', '$email')";

if ($conn->query($sql) === TRUE) {


echo "New record created successfully. <a href='index.php'>Go back</a>";
} else {
echo "Error: " . $sql . "<br>" . $conn->error;
}

$conn->close();
?>

update_employee.php

<?php
include 'db_connect.php';

if ($_SERVER["REQUEST_METHOD"] == "POST") {
$id = (int)$_POST['id'];
$name = $conn->real_escape_string($_POST['name']);
$position = $conn->real_escape_string($_POST['position']);
$department = $conn->real_escape_string($_POST['department']);
$email = $conn->real_escape_string($_POST['email']);

$sql = "UPDATE employees SET name='$name', position='$position',


department='$department', email='$email' WHERE id=$id";

if ($conn->query($sql) === TRUE) {


echo "Record updated successfully. <a href='index.php'>Go back</a>";
} else {
echo "Error: " . $sql . "<br>" . $conn->error;
}

$conn->close();
} else {
$id = (int)$_GET['id'];
$sql = "SELECT * FROM employees WHERE id=$id";
$result = $conn->query($sql);
$employee = $result->fetch_assoc();
}
?>

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Update Employee</title>
</head>
<body>
<h1>Update Employee</h1>
<form action="update_employee.php" method="post">
<input type="hidden" name="id" value="<?php echo $employee['id']; ?>">
Name: <input type="text" name="name" value="<?php echo $employee['name']; ?>"
required><br>
Position: <input type="text" name="position" value="<?php echo $employee['position'];
?>"><br>
Department: <input type="text" name="department" value="<?php echo
$employee['department']; ?>"><br>
Email: <input type="email" name="email" value="<?php echo $employee['email']; ?>"
required><br>
<input type="submit" value="Update Employee">
</form>
</body>
</html>

delete_employee.php
<?php
include 'db_connect.php';

$id = (int)$_GET['id'];

$sql = "DELETE FROM employees WHERE id=$id";

if ($conn->query($sql) === TRUE) {


echo "Record deleted successfully. <a href='index.php'>Go back</a>";
} else {
echo "Error: " . $sql . "<br>" . $conn->error;
}

$conn->close();
?>

OUTPUT:

Adding an Employee

After filling out the form and submitting, you would see:

New record created successfully. <a href='index.php'>Go back</a>

Viewing Employees

You would see a table with employee details and action links:
Updating an Employee

After updating, you would see:

Record updated successfully. <a href='index.php'>Go back</a>

Deleting an Employee

After deleting, you would see:

Record deleted successfully. <a href='index.php'>Go back</a>

Security Measures

SQL Injection Prevention: Use prepared statements to prevent SQL injection. For
instance, modify add_employee.php as follows:

Using Prepared Statements


RESULT: Thus to enable a company to manage employee information effectively. The
application will allow administrative staff to perform CRUD (Create, Read, Update, Delete)
operations on employee records, ensuring a smooth and efficient way to handle employee
data has been successfully executed.

You might also like