Ict 2
Ict 2
Database Concepts
Relational databases
• A relational database (RDBMS) is a collection of data organized into tables with rows
and columns.
• It's a way to store and manage data in a structured way, making it easier to
understand how different data sets relate to each other.
• RDBMS stands for Relational Database Management Systems.
• The data in common databases is modeled in tables, making querying and processing
efficient.
• It is a program that allows us to create, delete, and update a relational database.
• A Relational Database is a database system that stores and retrieves data in a
tabular format organized in the form of rows and columns.
• It is a smaller subset of DBMS which was designed by E.F Codd in the 1970s.
• The major DBMSs like SQL, My-SQL, and ORACLE are all based on the principles
of relational DBMS.
Features of RDBMS
•Data must be stored in tabular form in DB file, that is, it should be organized in the
form of rows and columns.
•Each row of table is called record/tuple . Collection of such records is known as the
cardinality of the table
•Each column of the table is called an attribute/field. Collection of such columns is
called the arity of the table.
•No two records of the DB table can be same. Data duplicity is therefore avoided by
using a candidate key. Candidate Key is a minimum set of attributes required to identify
each record uniquely.
•Tables are related to each other with the help for foreign keys.
•Database tables also allow NULL values, that is if the values of any of the element of
the table are not filled or are missing, it becomes a NULL value, which is not equivalent
to zero. (NOTE: Primary key cannot have a NULL value).
SQL
•SQL, or Structured Query Language, is a programming language used to store,
retrieve, and manipulate data in relational databases.
•It is a standard Database language that is used to create, maintain, and retrieve the
relational database.
•SQL is a powerful language that can be used to carry out a wide range of operations
like insert ,delete and update.
SQL is mainly divided into four main categories:
1.Data definition language 2.Data manipulation language
3.Transaction control language 4.Data query language
Uses of SQL
•Data storage: SQL is used to store information in a database in tabular form, with
rows and columns representing different data attributes.
•Data retrieval: SQL is used to retrieve specific data items or a range of items from a
database.
•Data manipulation: SQL is used to add new data, remove or modify existing data.
•Access control: SQL is used to restrict a user's ability to retrieve, add, and modify
data.
•Data sharing: SQL is used to coordinate data sharing by concurrent users.
Components of a SQL System
A SQL system consists of several key components that work together to enable
efficient data storage, retrieval, and manipulation.
Some of the Key components of a SQL System are:
•Databases: Databases are structured collections of data organized into tables, rows,
and columns.
•Tables: Tables are the fundamental building blocks of a database, consisting of rows
(records) and columns (attributes or fields).
•Queries: Queries are SQL commands used to interact with databases.
Characteristics of SQL
•User-Friendly and Accessible, Declarative Language
•Efficient Database Management, Standardized Language
•SQL does not require a continuation character for multi-line queries, allowing
flexibility in writing commands across one or multiple lines.
•Queries are executed using a termination character (e.g., a semicolon ;), enabling
immediate and accurate command processing.
•SQL includes a rich set of built-in functions for data manipulation, aggregation, and
formatting, empowering users to handle diverse data-processing needs effectively.
Characteristics of a DBMS:
•Data storage: DBMS uses a digital repository on a server to store and manage
information.
•Data manipulation: DBMS provides a logical and clear view of the process that
manipulates data.
•Data security: DBMS provides data security.
•Data backup and recovery: DBMS contains automatic backup and recovery
procedures.
•ACID properties: DBMS contains ACID properties that maintain data in a healthy
state in case of failure.
•Concurrency control: In multi-user environments, DBMS manages concurrent access
to the database to prevent conflicts and ensure data consistency.
Types of DBMS
•Relational Database Management System RDBMS
•NoSQL DBMS
•Object-oriented database management system (OODBMS): This DBMS is based on
the principles of object-oriented programming.
•Hierarchical DBMS: This type of DBMS stores data in a parent-children relationship
node.
•Multi-model DBMS: This system supports more than one database model.
Application of Database
•Banking: Manages accounts, transactions, and financial records.
•Airlines: Handles bookings, schedules, and availability.
•E-commerce: Supports catalogs, orders, and secure transactions.
•Healthcare: Stores patient records and billing.
•Education: Manages student data and course enrollments.
•Telecom: Tracks call records and billing.
•Government: Maintains census and taxation data.
•Social Media: Stores user profiles and posts efficiently.
Database Languages
•Data Definition Language
•Data Manipulation Language
•Data Control Language
•Transactional Control Language
Disadvantages of DBMS
•Complexity: DBMS can be complex to set up and maintain, requiring specialized
knowledge and skills.
•Performance overhead: The use of a DBMS can add overhead to the performance of
an application, especially in cases where high levels of concurrency are required.
•Scalability: The use of a DBMS can limit the scalability of an application, since it
requires the use of locking and other synchronization mechanisms to ensure data
consistency.
•Cost: The cost of purchasing, maintaining and upgrading a DBMS can be high,
especially for large or complex systems.
•Limited Use Cases: Not all use cases are suitable for a DBMS, some solutions don’t
need high reliability, consistency or security and may be better served by other types of
data storage.
Data Structures
Some of the important data structures are as under:
Arrays
Array is a collection of items of the same variable type that are stored at contiguous
memory locations.
It is one of the most popular and simple data structures used in programming.
Types of Arrays
Arrays can be classified in two ways:
•On the basis of Size
•On the basis of Dimensions
•One-dimensional Array (1-D Array): You can imagine a 1-d array as a row, where
elements are stored one after another.
.Multi-dimensional Array: A multi-dimensional array is an array with more than one
dimension.
We can use multidimensional array to store complex data in the form of tables, etc. We
can have 2-D arrays, 3-D arrays, 4-D arrays and so on.
•Two-Dimensional Array (2-D Array or Matrix): 2-D Multidimensional arrays can be
considered as an array of arrays or as a matrix consisting of rows and columns.
•Three-Dimensional Array (3-D Array): A 3-D Multidimensional array contains three
dimensions, so it can be considered an array of two-dimensional arrays.
Linked Lists
A linked list is a fundamental data structure in computer science.
It mainly allows efficient insertion and deletion operations compared to arrays.
Like arrays, it is also used to implement other data structures like stack, queue and
deque
Comparison between Linked Lists and Arrays
Features of Linked List:
•Data Structure: Non-contiguous
•Memory Allocation: Typically allocated one by one to individual elements
•Insertion/Deletion: Efficient
•Access: Sequential
Features of Array:
•Data Structure: Contiguous
•Memory Allocation: Typically allocated to the whole array
•Insertion/Deletion: Inefficient
•Access: Random
Stacks
•A Stack is a linear data structure that follows a particular order in which the
operations are performed.
•The order may be LIFO(Last In First Out) or FILO(First In Last Out).
•LIFO implies that the element that is inserted last, comes out first and FILO implies
that the element that is inserted first, comes out last.
•It behaves like a stack of plates, where the last plate added is the first one to be
removed.
1.Pushing an element onto the stack is like adding a new plate on top.
2.Popping an element removes the top plate from the stack.
Types of Stack
Fixed Size Stack : As the name suggests, a fixed size stack has a fixed size and
cannot grow or shrink dynamically.
•If the stack is full and an attempt is made to add an element to it, an overflow error
occurs.
•If the stack is empty and an attempt is made to remove an element from it, an
underflow error occurs.
Dynamic Size Stack : A dynamic size stack can grow or shrink dynamically.
•When the stack is full, it automatically increases its size to accommodate the new
element, and when the stack is empty, it decreases its size.
•This type of stack is implemented using a linked list, as it allows for easy resizing of
the stack.
Types of Queues:
There are five different types of queues that are used in different scenarios. They are:
1.Input Restricted Queue (this is a Simple Queue)
2.Output Restricted Queue (this is also a Simple Queue)
3.Circular Queue
4.Double Ended Queue (Deque)
5.Priority Queue
•Ascending Priority Queue
•Descending Priority Queue
1.Circular Queue
•Circular Queue is a linear data structure in which the operations are performed based
on FIFO (First In First Out) principle and the last position is connected back to the first
position to make a circle.
•It is also called ‘Ring Buffer’.
5.Priority Queue
•A priority queue is a special type of queue in which each element is associated with a
priority and is served according to its priority.
•There are two types of Priority Queues. They are:
a)Ascending Priority Queue: Element can be inserted arbitrarily but only smallest
element can be removed. For example, suppose there is an array having elements 4, 2,
8 in the same order. So, while inserting the elements, the insertion will be in the same
sequence but while deleting, the order will be 2, 4, 8.
b)Descending priority Queue: Element can be inserted arbitrarily but only the largest
element can be removed first from the given Queue. For example, suppose there is an
array having elements 4, 2, 8 in the same order. So, while inserting the elements, the
insertion will be in the same sequence but while deleting, the order will be 8, 4, 2.
The time complexity of the Priority Queue is O(logn).
Trees
•Tree data structure is a hierarchical structure that is used to represent and organize
data in the form of parent child relationship.
•The following are some real world situations which are naturally a tree.
a)Folder structure in an operating system.
b)Tag structure in an HTML (root tag the as html tag) or XML document.
•The topmost node of the tree is called the root, and the nodes below it are called the
child nodes.
•Each node can have multiple child nodes, and these child nodes can also have their
own child nodes, forming a recursive structure.
Unstable sorting:
If a sorting algorithm, after sorting
the contents, changes the sequence
of similar content in which they
appear, it is called unstable sorting.
Adaptive and Non-Adaptive Sorting Algorithm
Adaptive Algorithm:
A sorting algorithm is said to be adaptive, if it takes advantage of already 'sorted'
elements in the list that is to be sorted.
That is, while sorting if the source list has some element already sorted, adaptive
algorithms will take this into account and will try not to re-order.
Non-adaptive algorithm:
A non-adaptive algorithm is one which does not take into account the elements which
are already sorted. They try to force every single element to be re-ordered to confirm
their sortedness.
Important Terms:
a) Increasing Order
A sequence of values is said to be in increasing order, if the successive element is
greater than the previous one. For example, 1, 3, 4, 6, 8, 9 are in increasing order, as
every next element is greater than the previous element.
b) Decreasing Order
A sequence of values is said to be in decreasing order, if the successive element is less
than the current one. For example, 9, 8, 6, 4, 3, 1 are in decreasing order, as every
next element is less than the previous element.
c) Non-Increasing Order
A sequence of values is said to be in non-increasing order, if the successive element is
less than or equal to its previous element in the sequence. This order occurs when the
sequence contains duplicate values. For example, 9, 8, 6, 3, 3, 1 are in non-increasing
order, as every next element is less than or equal to (in case of 3) but not greater than
any previous element.
d) Non-Decreasing Order
A sequence of values is said to be in non-decreasing order, if the successive element is
greater than or equal to its previous element in the sequence. This order occurs when
the sequence contains duplicate values. For example, 1, 3, 3, 6, 8, 9 are in non-
decreasing order, as every next element is greater than or equal to (in case of 3) but
not less than the previous one.
Various Sorting Algorithms:
Bubble Sort
•Bubble sort is a simple sorting algorithm.
•This sorting algorithm is comparison-based algorithm in which each pair of adjacent
elements is compared and the elements are swapped if they are not in order.
•This algorithm is not suitable for large data sets as its average and worst case
complexity are of O(n2) where n is the number of items.
•Bubble Sort is an elementary sorting algorithm, which works by repeatedly
exchanging adjacent elements, if necessary.
•When no exchanges are required, the file is sorted.
Insertion Sort
•Insertion sort is a very simple method to sort numbers in an ascending or descending
order.
•This method follows the incremental method.
•It can be compared with the technique how cards are sorted at the time of playing a
game.
•This is an in-place comparison-based sorting algorithm.
•Here, a sub-list is maintained which is always sorted.
Selection sort
•Selection sort is a simple sorting algorithm.
•This sorting algorithm, like insertion sort, is an in-place comparison-based algorithm
in which the list is divided into two parts, the sorted part at the left end and the
unsorted part at the right end.
•Initially, the sorted part is empty and the unsorted part is the entire list.
•The smallest element is selected from the unsorted array and swapped with the
leftmost element, and that element becomes a part of the sorted array.
•This process continues moving unsorted array boundaries by one element to the
right.
•This algorithm is not suitable for large data sets as its average and worst case
complexities are of O(n2), where n is the number of items.
•This type of sorting is called Selection Sort as it works by repeatedly sorting
elements.
•That is we first find the smallest value in the array and exchange it with the element in
the first position, then find the second smallest element and exchange it with the
element in the second position, and we continue the process in this way until the entire
array is sorted.
Merge sort
•Merge sort is a sorting technique based on divide and conquer technique.
•With worst-case time complexity being Ο(n log n), it is one of the most used and
approached algorithms.
•Merge sort first divides the array into equal halves and then combines them in a sorted
manner.
•Merge sort keeps on dividing the list into equal halves until it can no more be divided.
•By definition, if it is only one element in the list, it is considered sorted.
•Then, merge sort combines the smaller sorted lists keeping the new list sorted too.
Shell sort
•Shell sort is a highly efficient sorting algorithm and is based on insertion sort algorithm.
•This algorithm avoids large shifts as in case of insertion sort, if the smaller value is to the
far right and has to be moved to the far left.
•This algorithm uses insertion sort on a widely spread elements, first to sort them and
then sorts the less widely spaced elements. This spacing is termed as interval.
•This algorithm is quite efficient for medium-sized data sets as its average and worst case
complexity are of O(n), where n is the number of items
Heap Sort
•Heap Sort is an efficient sorting technique based on the heap data structure.
•The heap is a nearly-complete binary tree where the parent node could either be
minimum or maximum.
•The heap with minimum root node is called min-heap and the root node with maximum
root node is called max-heap.
•The elements in the input data of the heap sort algorithm are processed using these two
methods.
•The time complexity of the heap sort algorithm is O(nlogn), similar to merge sort.
Quick sort
•Quick sort is a highly efficient sorting algorithm and is based on partitioning of array of
data into smaller arrays.
•A large array is partitioned into two arrays one of which holds values smaller than the
specified value, say pivot, based on which the partition is made and another array holds
values greater than the pivot value.
•Quick sort partitions an array and then calls itself recursively twice to sort the two
resulting subarrays.
•The worst case complexity of Quick-Sort algorithm is O(n2). However, using this
technique, in average cases generally we get the output in O (n log n) time.
Searching Algorithms
•Searching is a process of finding a particular record, which can be a single element or
a small chunk, within a huge amount of data.
•The data can be in various forms: arrays, linked lists, trees, heaps, and graphs etc.
•With the increasing amount of data nowadays, there are multiple techniques to perform
the searching operation.
•Searching Algorithms in Data Structures
•Various searching techniques can be applied on the data structures to retrieve certain
data.
•A search operation is said to be successful only if it returns the desired element or
data; otherwise, the searching method is unsuccessful.
There are two categories of searching techniques.
Sequential Searching
Interval Searching
Sequential Searching
•As the name suggests, the sequential searching operation traverses through each
element of the data sequentially to look for the desired data.
•The data need not be in a sorted manner for this type of search.
Example − Linear Search
Interval Searching
•Unlike sequential searching, the interval searching operation requires the data to be in
a sorted manner.
•This method usually searches the data in intervals; it could be done by either dividing
the data into multiple sub-parts or jumping through the indices to search for an element.
Example − Binary Search, Jump Search etc.
Various Searching Algorithms
a)Linear Search Algorithm
•Linear search is a type of sequential searching algorithm.
•In this method, every element within the input array is traversed and compared with the
key element to be found.
•If a match is found in the array the search is said to be successful; if there is no match
found the search is said to be unsuccessful and gives the worst-case time complexity.
•The algorithm for linear search is relatively simple.
•Linear search traverses through every element sequentially therefore, the best case is
when the element is found in the very first iteration.
•The best-case time complexity would be O(1).
•However, the worst case of the linear search method would be an unsuccessful search
that does not find the key value in the array, it performs n iterations.
•Therefore, the worst-case time complexity of the linear search algorithm would be
O(n).
b Binary Search Algorithm
•Binary search is a fast search algorithm with run-time complexity of Ο(log n).
•This search algorithm works on the principle of divide and conquer, since it divides the
array into half before searching.
•For this algorithm to work properly, the data collection should be in the sorted form.
•Binary search looks for a particular key value by comparing the middle most item of
the collection.
•If a match occurs, then the index of item is returned.
•But if the middle item has a value greater than the key value, the right sub-array of the
middle item is searched.
•Otherwise, the left sub-array is searched.
•This process continues recursively until the size of a subarray reduces to zero.
•Binary Search algorithm is an interval searching method that performs the searching
in intervals only.
•The input taken by the binary search algorithm
must always be in a sorted array since
it divides the array into subarrays
based on the greater or lower values.
•The time complexity of the binary
search algorithm is O(log n).
c) Interpolation Search Algorithm
•Interpolation search is an improved variant of binary search.
•This search algorithm works on the probing position of the required value.
•For this algorithm to work properly, the data collection should be in a sorted form and
equally distributed.
•Runtime complexity of interpolation search algorithm is Ο(log (log n)) as compared to
Ο(log n) of BST in favorable situations.
d) Jump Search Algorithm
•Jump Search algorithm is a slightly modified version of the linear search algorithm.
•The main idea behind this algorithm is to reduce the time complexity by comparing lesser
elements than the linear search algorithm.
•The input array is hence sorted and divided into blocks to perform searching while
jumping through these blocks.
•The time complexity of the jump search technique is O(√n) and space complexity is O(1).
e) Exponential Search Algorithm
•Exponential search algorithm targets a range of an input array in which it assumes that
the required element must be present in and performs a binary search on that particular
small range.
•This algorithm is also known as doubling search or finger search.
•It is similar to jump search in dividing the sorted input into multiple blocks and conducting
a smaller scale search.
•However, the difference occurs while performing computations to divide the blocks and
the type of smaller scale search applied (jump search applies linear search and
exponential search applies binary search).
•Hence, this algorithm jumps exponentially in the powers of 2.
•In simpler words, the search is performed on the blocks divided using pow(2, k) where k is
an integer greater than or equal to 0.
•Once the element at position pow(2, n) is greater than the key element, binary search is
performed on the current block.
•Even though it is called Exponential search it does not perform searching in exponential
time complexity.
•But as we know, in this search algorithm, the basic search being performed is binary
search.
•Therefore, the time complexity of the exponential search algorithm will be the same as
the binary search algorithm’s, O(log n).
f) Fibonacci Search Algorithm
•As the name suggests, the Fibonacci Search Algorithm uses Fibonacci numbers to
search for an element in a sorted input array.
•Fibonacci Series is a series of numbers that have two primitive numbers 0 and 1.
•The successive numbers are the sum of preceding two numbers in the series.
•This is an infinite constant series, therefore, the numbers in it are fixed.
•The main idea behind the Fibonacci series is also to eliminate the least possible places
where the element could be found.
•In a way, it acts like a divide & conquer algorithm (logic being the closest to binary
search algorithm).
•This algorithm, like jump search and exponential search, also skips through the indices
of the input array in order to perform searching.
•The Fibonacci Search Algorithm makes use of the Fibonacci Series to diminish the
range of an array on which the searching is set to be performed.
•With every iteration, the search range decreases making it easier to locate the element
in the array.
•The Fibonacci Search algorithm takes logarithmic time complexity to search for an
element.
•Since it is based on a divide on a conquer approach and is similar to idea of binary
search, the time taken by this algorithm to be executed under the worst case
consequences is O(log n).
g) Sublist Search Algorithm
•Until now we have only seen how to search for one element in a sequential order of
elements.
•But the sublist search algorithm provides a procedure to search for a linked list in
another linked list.
•It works like any simple pattern matching algorithm where the aim is to determine
whether one list is present in the other list or not.
•The algorithm walks through the linked list where the first element of one list is
compared with the first element of the second list; if a match is not found, the second
element of the first list is compared with the first element of the second list.
•This process continues until a match is found or it reaches the end of a list.
•The main aim of this algorithm is to prove that one linked list is a sub-list of another
list.
•This process continues until a match is found or it reaches the end of a list.
•The main aim of this algorithm is to prove that one linked list is a sub-list of another
list.
•Searching in this process is done linearly, checking each element of the linked list one
by one; if the output returns true, then it is proven that the second list is a sub-list of
the first linked list.
•The time complexity of the sublist search depends on the number of elements present
in both linked lists involved.
•The worst case time taken by the algorithm to be executed is O(m*n) where m is the
number of elements present in the first linked list and n is the number of elements
present in the second linked list.
Emerging Technologies
Machine Learning
Machine learning is a type of artificial intelligence (AI) that allows computers to learn
and improve from data without being explicitly programmed.
It uses algorithms to analyze large amounts of data, identify patterns, and make
predictions.
Types of Machine Learning
Machine learning can be broadly
categorized into three types:
• Supervised Learning:
Trains models on labeled data to
predict or classify new, unseen data.
• Unsupervised Learning:
Finds patterns or groups in unlabeled
data, like clustering or dimensionality reduction.
• Reinforcement Learning:
Learns through trial and error to maximize rewards, ideal for decision-making tasks.
Learning Applications:
Computer vision
The first Deep Learning application is Computer vision.
In computer vision, Deep learning AI models can enable machines to identify and
understand visual data.
Some of the main applications of deep learning in computer vision include:
• Object detection and recognition: Deep learning model can be used to identify and
locate objects within images and videos, making it possible for machines to perform
tasks such as self-driving cars, surveillance, and robotics.
• Image classification: Deep learning models can be used to classify images into
categories such as animals, plants, and buildings. This is used in applications such
as medical imaging, quality control, and image retrieval.
• Image segmentation: Deep learning models can be used for image segmentation into
different regions, making it possible to identify specific features within images.
Reinforcement learning:
In reinforcement learning, deep learning works as training agents to take action in an
environment to maximize a reward. Some of the main applications of deep learning in
reinforcement learning include:
• Game playing: Deep reinforcement learning models have been able to beat human
experts at games such as Chess
• Robotics: Deep reinforcement learning models can be used to train robots to
perform complex tasks such as grasping objects, navigation, and manipulation.
• Control systems: Deep reinforcement learning models can be used to control
complex systems such as power grids, traffic management, and supply chain
optimization.
Cyber attacks
• A cyber attack is an intentional attempt to access a computer system, network,
or device to steal, alter, or destroy data.
• A cyber attack is any intentional effort to steal, expose, alter, disable, or
destroy data, applications, or other assets through unauthorized access to a
network, computer system or digital device.
Common types of cyber attacks
a) Malware
Malware is a term used to describe malicious software, including spyware,
ransomware, viruses, and worms.
Malware breaches a network through a vulnerability, typically when a user clicks a
dangerous link or email attachment that then installs risky software.
Once inside the system, malware can do the following:
• Blocks access to key components of the network (ransomware)
• Installs malware or additional harmful software
• Covertly obtains information by transmitting data from the hard drive (spyware)
• Disrupts certain components and renders the system inoperable
b) Phishing
Phishing is the practice of sending fraudulent communications that appear to come
from a reputable source, usually through email.
The goal is to steal sensitive data like credit card and login information or to install
malware on the victim’s machine.
Phishing is an increasingly common cyberthreat.
c) Man-in-the-middle attack
Man-in-the-middle (MitM) attacks, also known as eavesdropping attacks, occur when
attackers insert themselves into a two-party transaction.
Once the attackers interrupt the traffic, they can filter and steal data.
Two common points of entry for MitM attacks are:
1. On unsecure public Wi-Fi, attackers can insert themselves between a visitor’s device
and the network. Without knowing, the visitor passes all information through the
attacker.
2. Once malware has breached a device, an attacker can install software to process
all of the victim’s information.
d) Denial-of-service attack
A denial-of-service attack floods systems, servers, or networks with traffic to exhaust
resources and bandwidth.
As a result, the system is unable to fulfill legitimate requests.
Attackers can also use multiple compromised devices to launch this attack.
This is known as a distributed-denial-of-service (DDoS) attack.
e) SQL injection
A Structured Query Language (SQL) injection occurs when an attacker inserts
malicious code into a server that uses SQL and forces the server to reveal information
it normally would not.
An attacker could carry out a SQL injection simply by submitting malicious code into a
vulnerable website search box.
f) Zero-day exploit
A zero-day exploit hits after a network vulnerability is announced but before a patch or
solution is implemented.
Attackers target the disclosed vulnerability during this window of time.
Zero-day vulnerability threat detection requires constant awareness.
g) DNS Tunneling
• DNS tunneling utilizes the DNS protocol to communicate non-DNS traffic over port
53.
• It sends HTTP and other protocol traffic over DNS.
• There are various, legitimate reasons to utilize DNS tunneling.
• However, there are also malicious reasons to use DNS Tunneling VPN services.
• They can be used to disguise outbound traffic as DNS, concealing data that is
typically shared through an internet connection.
• For malicious use, DNS requests are manipulated to exfiltrate data from a
compromised system to the attacker’s infrastructure.
• It can also be used for command and control callbacks from the attacker’s
infrastructure to a compromised system.
Mitigation Strategies
Cyber risk mitigation is the application of policies, technologies and procedures to
reduce the likelihood and impact of a successful cyber attack.
It is a critical practice to help guide decision-making around risk control and mitigation
and allows your organization to stay protected and achieve its business goals.
Cybersecurity mitigation strategies are actions that can be taken to reduce the impact
of cyber attacks. These strategies include:
• Risk assessment: Evaluate your organization's level of risk and identify areas for
improvement
• Network access controls: Authenticate and authorize users who request access to
data or systems
• Incident response plan: Create a plan that describes how to respond to an attack to
minimize delays.
• Security patches and updates: Regularly update software, operating systems, and
applications to fix known vulnerabilities
• Network traffic monitoring: Continuously monitor network traffic to detect and
respond to cyber attacks
• Security awareness training: Educate employees about cybersecurity risks and how
to avoid them
• Multifactor authentication: Require users to provide at least two forms of ID
verification
• Firewall and threat detection software: Use software to detect and block threats
• Physical security: Review your organization's physical security measures
• Minimize attack surface: Reduce the number of ways that attackers can gain
access to your systems
Big Data
DataMining
• Data mining is the process of sorting through large data sets to identify patterns and
relationships that can help solve business problems through data analysis.
• Data mining techniques and tools help enterprises to predict future trends and make
more informed business decisions.
• Data mining is the process of extracting knowledge or insights from large amounts
of data using various statistical and computational techniques.
• The data can be structured, semi-structured or unstructured, and can be stored in
various forms such as databases, data warehouses, and data lakes.
• The primary goal of data mining is to discover hidden patterns and relationships in
the data that can be used to make informed decisions or predictions.
• This involves exploring the data using various techniques such as clustering,
classification, regression analysis, association rule mining, and anomaly detection.
Data Mining Architecture
Data mining architecture refers to the overall design and structure of a data mining
system.
A data mining architecture typically includes several key components, which work
together to perform data mining tasks and extract useful insights and information from
data.
Some of the key components of a typical data mining architecture include:
• Data Sources: Data sources are the sources of data that are used in data mining.
These can include structured and unstructured data from databases, files, sensors,
and other sources. Data sources provide the raw data that is used in data mining and
can be processed, cleaned, and transformed to create a usable data set for analysis.
• Data Visualization: Data visualization is the process of presenting data and insights
in a clear and effective manner, typically using charts, graphs, and other
visualizations. Data visualization is an important part of data mining, as it allows data
miners to communicate their findings and insights to others in a way that is easy to
understand and interpret.
Types of Data Mining
There are many different types of data mining, but they can generally be grouped into
three broad categories: descriptive, predictive, and prescriptive.
• Descriptive data mining involves summarizing and describing the characteristics of
a data set. This type of data mining is often used to explore and understand the data,
identify patterns and trends, and summarize the data in a meaningful way.
• Descriptive data mining involves summarizing and describing the characteristics of
a data set. This type of data mining is often used to explore and understand the data,
identify patterns and trends, and summarize the data in a meaningful way.
• Prescriptive data mining involves using data and models to make recommendations
or suggestions about actions or decisions. This type of data mining is often used to
optimize processes, allocate resources, or make other decisions that can help
organizations achieve their goals.
Data Warehousing
• A Data Warehouse is separate from DBMS, it stores a huge amount of data, which
is typically collected from multiple heterogeneous sources like files, DBMS, etc.
• The goal is to produce statistical results that may help in decision-making.
• An ordinary Database can store MBs to GBs of data and that too for a specific
purpose.
• For storing data of TB size, the storage shifted to the Data Warehouse.
• Besides this, a transactional database doesn’t offer itself to analytics.
• To effectively perform analytics, an organization keeps a central Data Warehouse
to closely study its business by organizing, understanding, and using its historical data
for making strategic decisions and analyzing trends.
Benefits of Data Warehouse
• Better business analytics: Data warehouse plays an important role in every
business to store and analysis of all the past data and records of the company. which
can further increase the understanding or analysis of data for the company.
• Faster Queries: The data warehouse is designed to handle large queries that’s why it
runs queries faster than the database.
• Improved data Quality: In the data warehouse the data you gathered from different
sources is being stored and analyzed it does not interfere with or add data by itself so
your quality of data is maintained and if you get any issue regarding data quality then
the data warehouse team will solve this.
• Historical Insight: The warehouse stores all your historical data which contains
details about the business so that one can analyze it at any time and extract insights
from it.
Features of Data Warehousing
• Centralized Data Repository: Data warehousing provides a centralized repository
for all enterprise data from various sources, such as transactional databases,
operational systems, and external sources. This enables organizations to have a
comprehensive view of their data, which can help in making informed business
decisions.
• Data Integration: Data warehousing integrates data from different sources into a
single, unified view, which can help in eliminating data silos and reducing data
inconsistencies.
• Historical Data Storage: Data warehousing stores historical data, which enables
organizations to analyze data trends over time. This can help in identifying patterns
and anomalies in the data, which can be used to improve business performance.
• Query and Analysis: Data warehousing provides powerful query and analysis
capabilities that enable users to explore and analyze data in different ways. This can
help in identifying patterns and trends, and can also help in making informed business
decisions.
• Data Transformation: Data warehousing includes a process of data transformation,
which involves cleaning, filtering, and formatting data from various sources to make it
consistent and usable. This can help in improving data quality and reducing data
inconsistencies.
• Data Mining: Data warehousing provides data mining capabilities, which enable
organizations to discover hidden patterns and relationships in their data. This can help
in identifying new opportunities, predicting future trends, and mitigating risks.
• Data Security: Data warehousing provides robust data security features, such as
access controls, data encryption, and data backups, which ensure that the data is
secure and protected from unauthorized access.
Data Visualization
Big Data Visualization refers to the techniques and tools used to graphically represent
large and complex datasets in a way that is easy to understand and interpret·
Given the volume, variety, and velocity of big data, traditional visualization methods
often fall short, requiring more sophisticated approaches to make sense of such vast
amounts of information.
Big Data Visualization is characterized by its ability to handle:
• Volume: The sheer amount of data requires robust visualization tools that can
summarize and present data effectively without losing critical information.
• Variety: Data comes in various formats (structured, unstructured, semi-structured)
from diverse sources· Effective visualizations must accommodate this diversity.
• Velocity: Data streams in at high speeds, especially from sources like social media
and IoTdevices· Real-time visualization is often necessary to keep pace with the influx.
Importance of Big Data Visualization
1. Enhanced Decision-Making
Big data visualization provides a clear and immediate understanding of data, enabling
businesses and organizations to make informed decisions quickly.
2. Improved Data Comprehension
It aids in transforming raw numbers into meaningful insights, making it easier for
analysts and non-technical users to understand the underlying trends and
relationships.
3. Increased Engagement
Interactive visualizations can engage users more effectively than static reports.
Dashboards and real-time visual data representations allow users to interact with the
data, drilling down into specifics and exploring different scenarios, leading to deeper
insights and more meaningful engagement with the information.
2. Network diagrams:
Network diagrams are particularly valuable for visualizing complex interconnections
between entities, making them essential in fields such as social network analysis,
internet infrastructure, and bioinformatics.
• These diagrams are composed of nodes
(representing entities) and edges
(representing relationships)·
• Ideal for visualizing relationships and
interactions in data, such as social
connections or data flows.
Geospatial Maps
Geospatial maps integrate geographical data with analytical data, providing a spatial
dimension to the visualization. This type of visualization is crucial for any data with a
geographic component, ranging from global economic trends to local event planning.
Integrating large-scale geographical data with traditional data sets to provide spatial
analysis.
Tree Maps
Tree maps display hierarchical data as a set of
nested rectangles, where each branch of the
tree is given a rectangle, which is then tiled with
smaller rectangles representing sub-branches.
This method is useful for visualizing
part-to-whole relationships within a dataset.
Stream Graphs
Stream graphs are a type of stacked area
graph that is ideal for representing changes
in data over time. They are often used to
display volume fluctuations in streams of
data and are particularly effective in
showing the presence of patterns and
trends across different categories.
Parallel Coordinates
Parallel Coordinates is used for plotting
individual data elements across multiple
dimensions. It is particularly useful for
visualizing and analyzing multivariate
data, allowing users to see correlations
and patterns across several
interconnected variables.
Chord Diagrams
Chord diagrams are used to show inter-relationships between data points in a circle.
The arcs connect data points, and their thickness or color can indicate the weight or
value of the relationship, useful for representing connectivity or flows within a system.
These diagrams help identify patterns, clusters, or trends within complex datasets,
aiding in decision-making and analysis.
Applications of Big Data
Visualization
1.Healthcare
2.Finance
3.Retail
4.Manufacturing
5.Marketing
6.Transport and logistics
Information Technology Applications
1.E-commerce
•E-commerce, or electronic commerce, is the buying and selling of goods and services,
or the transfer of funds or data, over an electronic network, primarily the internet.
•E-commerce draws on technologies such as mobile commerce, electronic funds
transfer, supply chain management, Internet marketing, online transaction processing,
electronic data interchange (EDI), inventory management systems, and automated
data collection systems.
Online Shopping
•Online shopping is the act of buying goods or services over the internet using a
computer, smartphone, or tablet.
•Online shopping is a form of electronic commerce, also known as "ecommerce".
Ecommerce businesses can be online-only or have a physical presence as well.
•Some advantages of online shopping include Safety, Convenience, Better prices,
Variety, Authenticity, No pressure shopping, and Time-saving.
•Online shopping is becoming more and more widespread and accepted due to these
many conveniences.
Online Business
•An online business is a business that
conducts its primary activities over the
internet, such as buying, selling,
or providing services online.
•Online businesses can be started by anyone and can be inexpensive to get off the
ground.
•Here are some examples of online businesses:
•E-commerce: A business that sells products or services online, or uses the internet to
generate sales leads. E-commerce businesses can be run from a single website or
through multiple online channels.
•Selling handmade goods: You can sell handmade products on sites like Etsy, Amazon,
and eBay.
•Selling art: You can sell your art as prints, canvases, framed posters, or digital
downloads. You can also use a print-on-demand service to have your artwork printed
on mugs, T-shirts, or other goods.
•Affiliate marketing: You can promote other businesses' products through affiliate
marketing.
•Software development: You can develop software solutions.
E-Learning
•E-learning, or online learning, is a way of learning that uses the internet and other
digital technologies to deliver instruction.
•It's an alternative to traditional classroom learning.
•E-learning courses can be accessed through electronic devices like computers,
tablets, and cell phones.
•E-learning courses can be live, pre-recorded, or a combination of both.
•E-learning courses can include videos, quizzes, games, and other interactive
elements.
•Learning management systems can help facilitate e-learning by storing courses,
assessments, and grades.
•Benefits of e-learning include:
•Flexibility: Students can learn at their own pace and from any location.
•Cost-effective: There's no need to spend money on travel, seminars, and hotel rooms.
•Wider coverage: E-learning can reach a large number of people in many different
locations.
•Personalized: E-learning can be tailored to the needs of individual learners.
Online Education
•Online education, also known as distance learning, e-learning, or remote learning, is
a way to learn and teach that uses the internet.
•Online education can be used in all sectors of education, from elementary schools to
higher education. It can be an effective alternative to traditional classroom education.
•It can include: Watching videos, Reading articles, Taking online courses, Interacting
with teachers and other students online, and Submitting work electronically.
•Online education can be flexible and allow students to study at their own pace. There
are several types of online learning, including:
Asynchronous
Students complete coursework and exams within a given time frame, and interaction
usually takes place through discussion boards, blogs, and wikis.
Synchronous
Students and the instructor interact online simultaneously through text, video, or audio
chat.
Distance Learning: Distance learning refers to the way of learning that does not
require you to be present physically at the university or institution.
•Learning materials and lectures are available online.
•Learners can stay at their homes while taking the course from an online university or
other institution.
•They will usually also have the opportunity to attend in-person workshops, residencies,
or other learning components, but the material is primarily taught through online
courses.
2.BharatNet:
High-speed broadband connectivity to over 2.5 lakh Gram Panchayats.
4.Digital Locker:
A secure cloud-based platform for storing and sharing government-issued documents.
10. eCourts:
Digitization of judiciary for faster case resolutions.
Impact of Digital India
•Increased Connectivity: Internet penetration has improved significantly in rural areas.
•Digital Payments: Widespread adoption of digital payment methods like UPI and mobile
wallets.
•Transparency: Reduction in corruption due to direct benefit transfers and e-
governance.
•Job Creation: Enhanced opportunities in IT, electronics manufacturing, and e-
services.
•Inclusive Growth: Bridging the digital divide between urban and rural areas. The Digital
India Mission continues to evolve, empowering citizens with digital tools, enhancing
infrastructure, and fostering innovation across the country.