Ca - 601: Recent Trends in Information Technology
Ca - 601: Recent Trends in Information Technology
2-mark answers:
1. What is OLTP?
OLTP (Online Transaction Processing) refers to a system that manages real-time transaction
data, ensuring fast, accurate, and consistent processing. It is used in applications like banking
and e-commerce.
2. What is OLAP?
OLAP (Online Analytical Processing) is a data processing system that enables complex
queries and multidimensional analysis for business intelligence and decision-making.
7. What is Robotics?
Robotics is the branch of technology that deals with the design, construction, and operation
of robots to automate tasks typically performed by humans.
8. Define Spark.
Apache Spark is an open-source big data processing framework that enables fast, distributed
computing with support for machine learning, SQL, and real-time data streaming.
• Autonomous vehicles
1
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
4-mark answers
A Data Warehouse (DW) is a system designed to store, integrate, and analyze large amounts of
structured data from various sources to support decision-making processes. It follows a three-tier
architecture:
This layer is responsible for extracting data from multiple sources, including:
The data is extracted using ETL (Extract, Transform, Load) processes, where it is cleaned,
transformed, and loaded into the data warehouse.
This is the core of the data warehouse, where processed data is stored and structured for analytical
queries.
• Data Processing: This layer supports OLAP (Online Analytical Processing) for
multidimensional analysis.
• Metadata Management: Stores definitions, rules, and business logic for understanding data
relationships.
• Data Mining & Machine Learning for trend analysis and predictions.
2
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
Apache Spark is an open-source, distributed computing system used for big data processing and
analytics. It is known for its speed, scalability, and ease of use.
1. Spark Core
2. Spark SQL
3. Spark Streaming
5. GraphX
6. Cluster Managers
3
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
▪ Apache Mesos
▪ Hadoop YARN
▪ Kubernetes
Artificial Intelligence (AI) uses various techniques to simulate human intelligence. The three major AI
techniques are:
• Three types:
o Reinforcement Learning – Learns through trial and error (e.g., robotics, self-driving
cars).
2. Expert Systems
• Example: Medical diagnosis systems, where the system suggests treatments based on
symptoms.
OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) serve different
purposes in data management.
4
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
Query
Simple (INSERT, UPDATE, DELETE) Complex (JOINs, GROUP BY, aggregation)
Complexity
Data Volume Small (Few MBs to GBs) Large (GBs to TBs or more)
Performance Optimized for fast transactions Optimized for complex analytical queries
Conclusion
• OLTP systems are designed for day-to-day transaction processing (e.g., banking, retail).
• OLAP systems are optimized for analytical processing, helping businesses make better
decisions.
MOLAP (Multidimensional OLAP) and HOLAP (Hybrid OLAP) are two types of OLAP (Online Analytical
Processing) systems used for data analysis in a data warehouse.
Limited scalability due to storage Can handle large datasets by storing detail
Data Volume
constraints. data in relational databases.
Requires more time to preprocess More flexible, as it can switch between pre-
Processing Time
and aggregate data. aggregated data and detailed data.
5
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
Best for quick, complex queries in Best for applications that require both high
Use Case
smaller datasets. performance and scalability.
Summary
• HOLAP is a hybrid approach that provides a balance between speed and scalability.
Data mining is the process of extracting hidden patterns and insights from large datasets. The most
common data mining techniques include:
1. Classification
2. Clustering
• Example: Market Basket Analysis – "Customers who buy bread often buy butter."
4. Regression Analysis
5. Anomaly Detection
6. Decision Trees
7. Neural Networks
6
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
Hill Climbing is an AI search algorithm used to find an optimal solution by incrementally improving a
candidate solution.
o The algorithm may get stuck in a suboptimal solution instead of finding the global
best solution.
2. Plateau Problem
o In flat regions of the search space, the algorithm may stop making progress.
3. Ridges Problem
o Narrow paths that require movement in multiple directions can cause difficulty in
finding the best path.
4. No Backtracking
o Hill Climbing does not remember past states, so it cannot recover if it gets stuck.
Example
• In robotics, hill climbing may result in the robot getting stuck in a small hole instead of
reaching the destination.
DFS is a search algorithm that explores a path deep into a graph before backtracking.
Disadvantages of DFS
o If the tree or graph is deep, DFS uses a large amount of stack space (risk of stack
overflow).
7
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
Example
• In maze solving, DFS may go down the wrong path and backtrack multiple times, making it
inefficient.
o The search space is reduced exponentially, making it faster than BFS or DFS.
o Since the search expands from both sides, fewer nodes are stored in memory.
o If using BFS in both directions, Bidirectional Search finds the shortest path.
o Instead of searching the entire space, it meets in the middle, significantly reducing
time complexity.
• GPS Navigation Systems: When finding the shortest route, bidirectional search can reduce
search time by working from both the current location and the destination.
BFS is a graph traversal algorithm used in AI to explore all possible solutions level by level.
3. Move to the next level and repeat until the goal is found.
Characteristics of BFS
8
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
• Social Network Connections: BFS can find the shortest link between two people.
Both Apache Spark and MapReduce are used for big data processing, but Spark is much faster.
Computing
In-memory computing. Disk-based computing.
Type
Ease of Use Simpler APIs in Python, Scala, Java, R. Complex Java-based code.
Iterative
Best for ML & AI (fast iterations). Not efficient for iterative tasks.
Processing
Conclusion
A Data Warehouse stores historical data for analytics. It is widely used in various industries.
• Companies use BI tools (Tableau, Power BI) to generate insights from stored data.
• Banks use data warehouses to detect suspicious transactions and risk assessment.
9
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
13. What Are the Major Steps Involved in the ETL Process?
ETL (Extract, Transform, Load) is the process of moving data from various sources into a data
warehouse.
Step 1: Extract
Step 2: Transform
• Operations include:
o Removing duplicates
Step 3: Load
Example
Problem Statement
The Missionaries and Cannibals problem is a classic AI problem that involves three missionaries and
three cannibals trying to cross a river using a boat. The boat can hold at most two people at a time.
• The boat cannot cross the river without at least one person onboard.
10
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
• If at any point the number of cannibals is greater than the missionaries on either side of the
river, the cannibals will eat the missionaries.
Graph mining is a data mining technique used to analyze and extract patterns from graph-based
structures. It is used in applications like social networks, fraud detection, and web page ranking.
1. Frequent Subgraph Mining – Identifies recurring patterns in graphs (e.g., social network
friendships).
3. Community Detection – Groups similar nodes into clusters (e.g., customer segmentation).
Example Applications:
11
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
A heuristic function (h(n)) is used in AI search algorithms to estimate the cost of reaching the goal
from a given node. It helps in optimizing search algorithms like A and Greedy Search*.
Example:
Real-World Example:
• In a GPS Navigation System, the heuristic function estimates the remaining distance to the
destination using a straight-line distance (Euclidean distance).
Data Mining is used to discover patterns and trends from large datasets.
• Example: Amazon product recommendations ("Customers who bought this also bought...").
12
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
Graph search algorithms are used in Artificial Intelligence (AI) and Computer Science to traverse or
search through graphs and trees.
What is BFS?
• BFS is a level-order traversal technique where all nodes at the current depth level are
explored before moving to the next level.
3. Dequeue a node from the front and explore its unvisited neighbors.
Given Graph:
/\
B C
/\ \
D E F
BFS Order: A → B → C → D → E → F
Characteristics of BFS
Use Cases
13
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
What is DFS?
1. Start from the root node and push it onto the stack.
2. Pop the top node, explore its unvisited neighbors, and push them onto the stack.
Given Graph:
/\
B C
/\ \
D E F
DFS Order: A → B → D → E → C → F
Characteristics of DFS
✔ Uses less memory than BFS (does not store all nodes at a level).
✔ Works well with deep graphs.
✔ Good for topological sorting (dependency graphs).
Does not guarantee the shortest path.
Can get stuck in infinite loops if cycles exist.
Use Cases
3. Bidirectional Search
14
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
1. Start BFS from both the initial node and goal node.
Graph:
A -- B -- C -- D -- E -- F
Use Cases
Always finds a
Completeness If the graph is finite Always finds a solution
solution
15
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
Complexity (Worst
O(V + E) O(V + E) O(2^(d/2))
Case)
Problem Statement
Given two jugs with capacities X liters and Y liters, and an unlimited water supply, the goal is to
measure exactly Z liters using the jugs.
Example:
Valid Operations
3. Pour Water → Transfer water from one jug to another until the receiving jug is full or the
pouring jug is empty.
Diagram Representation
2. Snowflake Schema
16
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
Definition:
The Snowflake Schema is a type of database schema used in data warehousing, where a central fact
table is connected to multiple dimension tables, which are further normalized into sub-dimensions.
Characteristics:
Example Diagram:
Time_Dim Location_Dim
| |
V V
3. ETL Process
Definition:
ETL (Extract, Transform, Load) is a process in data warehousing to collect, process, and store data.
1. Extract: Fetch data from multiple sources (Databases, APIs, CSV files).
2. Transform: Clean, format, and structure data (e.g., removing duplicates, converting formats).
Diagram Representation:
Definition:
MEA is a heuristic search strategy that breaks down a large problem into smaller sub-problems by
reducing the difference between the current state and the goal state.
Steps in MEA:
17
CA - 601 : RECENT TRENDS IN INFORMATION TECHNOLOGY
18