0% found this document useful (0 votes)
7 views

SQL and NoSQL

Uploaded by

parth
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

SQL and NoSQL

Uploaded by

parth
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Unit III

Structured Query Language (SQL)

It is a standardized language specifically designed for managing and manipulating relational


databases. It allows users to perform a range of data operations like querying, updating,
inserting, and deleting records within a database. SQL also facilitates database structure
management by allowing the creation and alteration of tables, views, and indexes.

SQL's primary functions:

1. Data Querying: SQL can retrieve specific data from large datasets using queries,
often with conditions and filters.
2. Data Manipulation: SQL enables modification of data records by allowing
insertions, updates, and deletions.
3. Database Management: SQL allows users to create and modify the database
structure itself, defining tables, setting primary keys, and creating indexes for efficient
data access.
4. Data Control: SQL can manage access to data and control database transactions,
ensuring data integrity and security.

Not Only SQL (NoSQL)

It is a class of database systems that diverges from the traditional, structured format of
relational databases. NoSQL databases are designed to handle unstructured or semi-structured
data, making them well-suited for applications requiring flexible, scalable, and high-
performance data storage solutions. They are often used for handling large volumes of data in
real-time, such as social media content, IoT data, or any application with rapidly changing or
vast amounts of data.

Key characteristics of NoSQL databases:

1. Schema Flexibility: Unlike relational databases, NoSQL databases do not require a


fixed schema, which allows the structure of data to evolve as needs change. This
makes it easier to store and manage unstructured or semi-structured data.
2. Horizontal Scalability: NoSQL databases can scale out by adding more servers
(horizontal scaling), which supports large-scale data storage across distributed
systems.
3. Varied Data Models: NoSQL databases support different data models such as
Document oriented, Key-Value stores, Column family stores and Graph databases.
4. Eventual Consistency: Many NoSQL databases use an "eventual consistency"
model, where updates to data may not immediately be visible across the entire system.
Ccomparison between SQL (relational) and NoSQL (non-relational) databases

Feature SQL Databases NoSQL Databases


No standardized query
Use Structured Query
language; queries vary by
Language (SQL), a
database (e.g., MongoDB
Query Language standardized language for
uses JavaScript-based
querying and manipulating
queries, Cassandra uses
data
CQL)
Non-relational; includes
Relational (table-based);
various models like
Data Model organizes data in tables with
document, key-value,
rows and columns
column-family, and graph
Fixed schema; requires Flexible schema; allows
predefined tables with semi-structured or
Schema Structure
specific columns and data unstructured data without
types predefined schemas
Vertical scaling (adding more
Horizontal scaling (adding
power to a single server) is
more servers) is typical,
Scaling Approach common, although some
allowing better distribution
relational databases can scale
across clusters
horizontally
Suitable for complex
Best for simpler transactions,
transactions with multiple
often involving single or
Transaction Complexity operations across tables,
limited collections/tables
often requiring joins and
without complex joins
complex logic
BASE (Basically Available,
ACID (Atomicity,
Soft state, Eventual
Consistency, Isolation,
consistency), prioritizes
Transaction Properties Durability) compliant,
availability and performance
ensuring strict consistency
over strict consistency in
and reliability
distributed systems
Optimized for high-
Good performance for
performance data operations,
structured data and complex
especially with unstructured
Performance queries, though it can slow
data and large datasets;
down with very large
generally faster for simple
datasets
data access patterns
Supports joins to combine
Generally avoids joins;
data from multiple tables,
instead, denormalization is
Joins though they can slow down
used to duplicate data where
with complex joins in large
needed for faster access
datasets
Low flexibility; changing the High flexibility; data
schema or data structure can structures can evolve over
Flexibility
be complex and often time without significant
requires migrations overhead
Data Integrity High data integrity due to Lower integrity control, but
ACID properties, which high availability; BASE
properties allow for scenarios
make SQL databases ideal
where eventual consistency is
for banking and financial
acceptable (e.g., social
applications
media)
Ideal for big data, content
Suitable for applications
management, social media,
needing complex queries,
IoT, and real-time analytics
Use Cases transactions, and data
where data structure varies or
integrity, such as banking, e-
large datasets require fast
commerce, and ERP systems
processing
MySQL, PostgreSQL, MongoDB, Cassandra, Redis,
Examples
Oracle, SQL Server Neo4j, Couchbase

Pandas and NumPy are two foundational libraries in Python, widely used for data
manipulation, analysis, and scientific computing.

Pandas

The Pandas library designed specifically for data analysis, Pandas offers powerful, flexible
data structures like Series (1-dimensional) and DataFrame (2-dimensional). These structures
allow for easy manipulation and analysis of structured data. With Pandas, you can perform
operations such as data cleaning, filtering, grouping, merging, and reshaping. It is particularly
valuable for handling large datasets and offers various file-format compatibilities, such as
CSV, Excel, SQL, and more.

NumPy

The NumPy is also known as "Numerical Python," NumPy provides support for large, multi-
dimensional arrays and matrices. It also includes a wide range of mathematical functions for
operating on these arrays, enabling efficient computations in a structured, optimized manner.
NumPy arrays are faster and more memory-efficient than standard Python lists, making it a
core library for numerical computing and an essential component in fields like machine
learning and data science.

Ccomparison between Pandas and NumPy

Feature Pandas NumPy


Data manipulation and Numerical and scientific
Primary Purpose analysis for structured data, computing with multi-
especially tabular datasets. dimensional array support.
Series, DataFrame, and
ndarray (multi-dimensional
Data Structures Panel (for higher
arrays)
dimensions)
Supports mixed data types Primarily numerical data;
Data Type Support within DataFrames (e.g., arrays must have a single
integers, floats, strings) data type.
Slower for large, purely
Fast for numerical
numerical computations but
Performance computations, especially with
optimized for labeled, tabular
large datasets.
data.
Allows labeled indexing,
Primarily integer-based
which makes accessing data
Indexing & Labeling indexing; lacks direct support
by row and column names
for row/column labels.
convenient.
Built-in tools for data Provides foundational
Data Analysis Tools filtering, grouping, joining, mathematical functions, but
and reshaping. fewer data analysis functions.
Direct support for
Limited I/O support; usually
reading/writing data in
File I/O Support relies on Pandas for data
formats like CSV, Excel,
import/export.
SQL, etc.
Ideal for performing
Ideal for structured data
mathematical operations on
manipulation, e.g., data
Use Cases large numerical datasets,
cleaning, aggregations, and
linear algebra, and matrix
working with time series.
manipulation.
Memory-efficient with
Less memory-efficient due to
homogeneous arrays,
mixed data types; often
Memory Efficiency especially when handling
requires more memory for
large amounts of numerical
large DataFrames.
data.
Built on top of NumPy and
Stands alone as a base library
Dependencies relies on it for array
for array manipulation.
manipulation.

Data Extraction

It is the process of retrieving specific data from various sources, such as databases, web
pages, files, or APIs, often as part of data processing or data integration workflows. It
involves pulling relevant data to make it available for analysis or further transformation.

Data Import

It refers to bringing external data into a software application or a programming environment,


such as Python, for analysis. This usually involves loading data from files (like CSV, Excel,
or JSON) or databases into data structures (e.g., Pandas DataFrames) for easier manipulation
and exploration.
Aspect Data Extraction Data Import
Retrieves data from source Loads data into a specific
Purpose
systems application or environment
Focuses on accessing raw Focuses on structuring data
Scope
data for use
Can include varied sources
Often involves files or
Sources like databases, web pages,
databases
APIs
May involve filtering and Typically a straightforward
Process
parsing loading process
Aims to make data accessible
Goal Aims to collect data
within a tool or environment

You might also like