Biosec Notes
Biosec Notes
Chapter 1
Q. What is Data Abstraction?
Data abstraction is the process of hiding the complex implementation details of data and only
revealing the essential features or behavior to the outside world. It allows users to interact with data
at a higher level, without needing to understand the intricate inner workings. This helps in
managing complexity, enhancing security, and facilitating easier maintenance and understanding of
software systems.
Chapter 2
Q. What are Domain & Data dependency?
Domain and data dependency refers to the relationship between the domain of a system (the area or
field it is designed to operate within) and the data it relies on to function effectively. This concept is
critical in fields such as data science, machine learning, software engineering, and database
management.
Domain Dependency:
Domain dependency involves the constraints and requirements imposed by the specific domain in
which a system operates. The domain can be an industry (e.g., healthcare, finance), a specific
application area (e.g., natural language processing, image recognition), or a particular problem
space (e.g., predictive maintenance, fraud detection).
Data Quality: High-quality data is essential for the accuracy and reliability of a system. Poor data
quality can lead to errors and inefficiencies.
Data Variety: The types of data (structured, unstructured, semi-structured) and their formats (text,
image, video) that a system can process. This variety impacts how data is collected, stored, and
analyzed.
Data Volume: The amount of data required can vary significantly. Systems must be designed to
handle appropriate data volumes, whether big data scenarios or smaller datasets.
Data Source: The origin of data (internal databases, external APIs, sensor data) influences its
relevance and reliability. Trustworthy sources are critical for dependable system performance.
Data Dynamics: How frequently data changes and how the system needs to adapt to these changes.
Real-time data processing systems have different requirements compared to those dealing with
static historical data.
Practical Examples:
Healthcare: An AI system for diagnosing diseases depends on high-quality medical data, including
patient records, imaging data, and lab results. The system must comply with healthcare regulations
and use medical terminologies.
Finance: Fraud detection systems rely on transaction data, user behavior patterns, and financial
regulations. They require real-time data processing and integration with financial databases.
Retail: Recommendation systems in e-commerce depend on customer purchase history, browsing
behavior, and product data. The domain's focus on user experience and personalized marketing
influences system design.
Chapter 3
Q. Explain Indices. Types of Indices.
Indices in Database Management Systems (DBMS) are specialized data structures that enhance the
speed of data retrieval operations on a database table at the cost of additional storage space and
write performance.
Indices in Database Management Systems (DBMS) are data structures that improve the speed of
data retrieval operations on database tables. They allow the database engine to find and retrieve
specific rows much faster than scanning the entire table.
Types:
Primary Index: A primary index is an index that is automatically created on the primary key of a
database table. It helps in quickly locating records because it ensures that each value in the primary
key column is unique and sorted.
Secondary Index: A secondary index is an index created on columns that are not the primary key.
It provides an additional way to access data more quickly based on the values in these columns.
Clustered Index: A clustered index sorts and stores the actual data rows of the table based on the
index key values. It directly affects the order in which the rows are stored on disk.
Non-Clustered Index: A non-clustered index is an index that creates a separate structure from the
actual data rows, containing pointers to those rows. It doesn't alter the order of the data in the table.
Unique Index: A unique index ensures that all values in the indexed column are distinct,
preventing duplicate entries.
Bit-Map Index: A bitmap index uses bitmaps (arrays of bits) to represent the presence of values in
a column, making it efficient for certain types of queries.
Q. What is B-tree?
B-trees are a balanced tree data structure used for organizing and managing large amounts of data.
They maintain data in a sorted order and ensure efficient search, insertion, and deletion operations.
B-trees consist of nodes that contain multiple keys and children, allowing for broad branching and
keeping the tree height shallow. They automatically balance themselves after insertions and
deletions to maintain their structure. B-trees are disk-friendly, minimizing disk I/O operations by
storing keys in large nodes. They are widely used in database management systems for indexing and
organizing data efficiently.
Q. What is Hashing?
Hashing is a technique used to map data to a fixed-size array, known as a hash table, using a hash
function. This function takes an input (often called a key) and produces a fixed-size output, called a
hash value or hash code. The hash value is then used as an index to store or retrieve the associated
data in the hash table. Hashing allows for efficient data retrieval and storage, as accessing elements
in a hash table typically takes constant time on average, regardless of the size of the data set.
However, hash functions should ideally distribute keys evenly across the hash table to minimize
collisions, where multiple keys map to the same hash value. Hashing is widely used in various
applications, including databases, caching mechanisms, and cryptographic algorithms, due to its
speed and efficiency in data access.
Chapter 4
Q. Explain Concurrency Control in detail.
Concurrency control is a crucial aspect of database management systems that ensures the
consistency and correctness of data when multiple transactions are executed concurrently. It deals
with managing simultaneous access to shared resources, such as database records, by multiple users
or processes. The primary goal of concurrency control is to maintain data integrity while allowing
concurrent transactions to execute efficiently. This involves preventing certain types of conflicts,
such as lost updates, uncommitted data, and inconsistent reads, which can arise due to concurrent
execution.
Concurrency control mechanisms typically include locking, timestamping, and optimistic
concurrency control techniques:
Locking: Involves acquiring locks on database objects (e.g., rows, tables) to prevent other
transactions from accessing them concurrently. Locks can be exclusive (write locks) or
shared (read locks), and they are released once the transaction is completed.
Timestamping: Assigns a unique timestamp to each transaction based on its start time. By
comparing timestamps, the system can determine the order of transactions and resolve
conflicts accordingly. For example, a newer transaction may override the changes made by
an older transaction.
Optimistic Concurrency Control: Assumes that conflicts between transactions are rare and
allows them to proceed without acquiring locks initially. Before committing, the system
checks for conflicts. If conflicts are detected, the transaction is rolled back and retried.
Concurrency control mechanisms ensure that transactions maintain the ACID (Atomicity,
Consistency, Isolation, Durability) properties, even in a multi-user environment. Atomicity
guarantees that transactions are either fully executed or not executed at all. Consistency ensures that
the database remains in a valid state before and after transaction execution. Isolation ensures that
each transaction appears to execute in isolation from other transactions, regardless of actual
concurrency. Durability ensures that committed changes are permanently saved even in the event of
system failures.
Multi-version Concurrency Control (MVCC): MVCC allows multiple versions of the same data
item to coexist in the database at the same time. When a transaction updates a data item, instead of
overwriting the existing value, MVCC creates a new version of the data item. Each version is
associated with a timestamp or a system version number that indicates when it was created.
Transactions read the most recent committed version of a data item that is consistent with their own
timestamp. This allows for read consistency without blocking read operations, as readers can access
the appropriate version of the data item without waiting for exclusive locks to be released.
MVCC is commonly used in database systems like PostgreSQL and Oracle, where it provides a
high degree of concurrency while ensuring read consistency and avoiding the need for extensive
locking.
Optimistic Concurrency Control (OCC): OCC is based on the assumption that conflicts between
transactions are rare. In OCC, transactions proceed without acquiring locks initially. Instead, they
perform their operations and check for conflicts only when they are ready to commit.
Before committing, a transaction compares its read set (the data items it read) and write set (the data
items it modified) with the current state of the database. If no conflicts are detected, the transaction
commits successfully. However, if conflicts are found (e.g., if another transaction modified a data
item that was read or written by the current transaction), the transaction is aborted and restarted.
OCC is suitable for environments with low contention and short transaction durations, as it
minimizes the overhead of acquiring and releasing locks. It is commonly used in scenarios like
optimistic replication and distributed databases, where conflicts are infrequent and concurrency is
crucial.
Chapter 5
Q. What does the term Database Security mean?
Database security in a Database Management System (DBMS) involves protecting the data within
the database from unauthorized access, misuse, or corruption. This is crucial for maintaining the
confidentiality, integrity, and availability of the data.
Credentials Submission: The user provides credentials, such as an username and password,
biometric data, or a security token.
Verification: The system checks these credentials against a stored set of authorized credentials.
Access Granted/Denied: If the credentials match, the system grants access. If they do not match,
access is denied.
Types of Authentication:
Password-Based Authentication: Users provide an username and a password. The system verifies
the password against a stored hash of the password.
Multi-Factor Authentication (MFA): Combines two or more authentication factors, such as:
Something You Know: Password or PIN.
Something You Have: Security token, smartphone.
Something You Are: Biometric data like fingerprints, facial recognition.
Biometric Authentication: Uses unique biological traits, such as fingerprints, iris scans, or voice
recognition.
Token-Based Authentication: Involves a physical or software token that generates a unique code
used along with a password.
Certificate-Based Authentication: Uses digital certificates issued by a trusted certificate authority
(CA) to verify identity.
OAuth/OpenID Connect: Allows users to authenticate using credentials from a third-party service
provider like Google or Facebook.
Q. What is Authorization? How Authorization Works? Types of Authorization Models.
Authorization is the process of determining and granting permissions or access rights to
authenticated users, allowing them to perform specific actions or access certain resources within a
system. While authentication verifies the identity of a user, authorization decides what an
authenticated user is allowed to do.
User Authentication: The user first authenticates themselves through a process like password entry,
biometrics, or multi-factor authentication.
Access Request: After successful authentication, the user requests access to specific resources or
actions.
Permission Evaluation: The system checks the user's permissions against the requested action.
This involves evaluating the user's roles, permissions, and applicable policies.
Access Granted/Denied: Based on the evaluation, the system either grants or denies access to the
requested resources or actions.
Advantages: Disadvantages:
Flexibility: Easy to share resources and Security Risks: Potential for excessive
modify permissions as needed. or inappropriate permissions being
granted.
User Empowerment: Owners manage Scalability Issues: Managing
access to their own resources. permissions for a large number of
resources and users can be complex.
Use Cases:
File Systems: Commonly used in operating systems like Windows, Unix, and Linux.
Databases: Controls access to database objects based on user roles and permissions.
MAC: The Mandatory Access Control (MAC) model is a stringent access control method where
access permissions are regulated by a central authority based on security policies and
classifications. Unlike the Discretionary Access Control (DAC) model, individual users do not have
the ability to grant or modify access permissions for resources they create or own.
Advantages: Disadvantages:
High Security: Ensures strict adherence Rigidity: Less flexible than DAC, as
to security policies, minimizing the risk users cannot change permissions to meet
of unauthorized access. specific needs.
Consistency: Provides uniform Complexity: Can be complex to manage,
enforcement of access rules across the especially in dynamic environments with
organization. changing security requirements.
Use Cases:
Military and Government: Commonly used in environments where security is paramount,
such as military and government institutions.
Classified Information: Suitable for managing access to classified or sensitive information
that requires stringent control.
RBAC: Role-Based Access Control (RBAC) is a widely used access control model that assigns
permissions to users based on their roles within an organization. Rather than managing individual
user permissions, access rights are grouped by roles, and users are assigned to these roles,
simplifying the management of permissions.
Advantages: Disadvantages:
Simplified Management: Easier to Rigidity: May lack flexibility for users
manage and audit permissions as they are with unique or overlapping roles,
grouped by roles. potentially requiring multiple role
assignments.
Scalability: Efficiently handles Initial Setup Complexity: Defining
permissions in large organizations by roles and assigning permissions can be
minimizing the number of access control complex and time-consuming during the
entries. initial setup.
Use Cases:
Corporate Environments: Commonly used in businesses to manage employee access
based on their job roles (e.g., administrator, manager, employee).
Enterprise Systems: Suitable for systems where users' access needs are well-defined and
relatively stable.