0% found this document useful (0 votes)

76 views27 pages

AWS Data Engineering Cheatsheet2

The AWS Data Engineering Cheatsheet provides essential information on commonly used AWS services, including Amazon Redshift, Amazon S3, and Amazon Athena. It covers features, components, management commands, and security aspects for each service, aimed at assisting data engineers in utilizing AWS effectively. The document serves as a quick reference guide for performing various data engineering tasks within the AWS ecosystem.

Uploaded by

Lahbib Fedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views27 pages

AWS Data Engineering Cheatsheet2

Uploaded by

Lahbib Fedi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Nata in Data

AWS Data Engineering Cheatsheet

Nataindata
May 20, 2024 • 11 min read

Hello dears, here you can find cheat sheets for most commonly used AWS
services in Data Engineering, like:

AWS Redshift Cheat Sheet

Amazon S3 Cheat Sheet

Subscribe
Amazon Athena Cheat Sheet
https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 1/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Amazon Kinesis Cheat Sheet

Amazon Redshift Cheat Sheet

Overview
Amazon Redshift is a fully managed, petabyte-scale data warehouse service
that extends data warehouse queries to your data lake. It allows you to run
analytic queries against petabytes of data stored locally in Redshift and
directly against exabytes of data stored in S3. Redshift is designed for OLAP
(Online Analytical Processing).

Currently, Redshift only supports Single-AZ deployments.

Features
Columnar Storage: Redshift uses columnar storage, data
compression, and zone maps to minimize the amount of I/O needed
for queries.

Parallel Processing: It utilizes a massively parallel processing (MPP)

data warehouse architecture to distribute SQL operations across
multiple nodes.

Machine Learning: Redshift leverages machine learning to optimize

throughput based on workloads.

Result Caching: Provides sub-second response times for repeat

queries.
https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 2/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Automated Backups: Redshift continuously backs up your data to

S3 and can replicate snapshots to another region for disaster
recovery.

Components
Cluster: Comprises a leader node and one or more compute nodes.
A database is created upon provisioning a cluster for loading data
and running queries.

Scaling: Clusters can be scaled in/out by adding/removing nodes

and scaled up/down by changing node types.

Maintenance Window: Redshift assigns a 30-minute maintenance

window randomly within an 8-hour block per region each week.
During this time, clusters are unavailable.

Deployment Platforms: Supports both EC2-VPC and EC2-Classic

platforms for launching clusters.

Redshift Nodes
Leader Node: Manages client connections, parses queries, and
coordinates execution plans with compute nodes.

Compute Nodes: Execute query plans, exchange data, and send

intermediate results to the leader node for aggregation.

Node Types
Dense Storage (DS): For large data workloads using HDD storage.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 3/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Dense Compute (DC): Optimized for performance-intensive

workloads using SSD storage.

Parameter Groups
Parameter groups apply to all databases within a cluster. The default
parameter group has preset values and cannot be modified.

Database Querying Options

Query Editor: Use the AWS Management Console to connect to
your cluster and run queries.

SQL Client Tools: Connect via standard ODBC and JDBC

connections.

Enhanced VPC Routing: Manages data flow between your cluster

and other resources using VPC features.

Redshift Spectrum
Query Exabytes of Data: Run queries against data in S3 without
loading or transforming it.

Columnar Format: Scans only the needed columns for your query,
reducing data processing.

Compression Algorithms: Scans less data when data is compressed

with supported algorithms.

Redshift Streaming Ingestion

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 4/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Streaming Data: Consume and process data directly from

streaming sources like Amazon Kinesis Data Streams and Amazon
Managed Streaming for Apache Kafka (MSK).

Low Latency: Provides high-speed ingestion without staging data in

S3.

Redshift ML
Machine Learning: Train and deploy machine learning models using
SQL commands within Redshift.

In-Database Inference: Perform in-database predictions without

moving data.

SageMaker Integration: Utilizes Amazon SageMaker Autopilot to

find the best model for your data.

Redshift Data Sharing

Live Data Sharing: Securely share live data across Redshift clusters
within an AWS account without copying data.

Up-to-Date Information: Users always access the most current data

in the warehouse.

No Additional Cost: Available on Redshift RA3 clusters without

extra charges.

Redshift Cross-Database Query

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 5/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Query Across Databases: Allows querying across different

databases within a Redshift cluster,

regardless of the database you are connected to. This feature is available on
Redshift RA3 node types at no extra cost.

Cluster Snapshots
Types: There are two types of snapshots, automated and manual,
stored in S3 using SSL.

Automated Snapshots: Taken every 8 hours or 5 GB per node of

data change and are enabled by default. They are deleted at the
end of a one-day retention period, which can be modified.

Manual Snapshots: Retained indefinitely unless manually deleted.

Can be shared with other AWS accounts.

Cross-Region Snapshots: Snapshots can be copied to another AWS

Region for disaster recovery, with a default retention period of
seven days.

Monitoring
Audit Logging: Tracks authentication attempts, connections,
disconnections, user definition changes, and queries. Logs are
stored in S3.

Event Tracking: Redshift retains information about events for

several weeks.

Performance Metrics: Uses CloudWatch to monitor physical

aspects like CPU utilization, latency, and throughput.
https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 6/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Query/Load Performance Data: Helps monitor database activity

and performance.

CloudWatch Alarms: Optionally configured to monitor disk space

usage across cluster nodes.

Security
Access Control: By default, only the AWS account that creates the
cluster can access it.

IAM Integration: Create user accounts and manage permissions

using IAM.

Security Groups: Use Redshift security groups for EC2-Classic

platforms and VPC security groups for EC2-VPC platforms.

Encryption: Optionally encrypt clusters upon provisioning.

Encrypted clusters' snapshots are also encrypted.

Pricing
Billing: Pay per second based on the type and number of nodes in
your cluster.

Spectrum Scanning: Pay for the number of bytes scanned by

Redshift Spectrum.

Reserved Instances: Save costs by committing to 1 or 3-year terms.

Cluster Management

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 7/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Creating a Cluster

aws redshift create-cluster \

--cluster-identifier my-redshift-cluster \
--node-type dc2.large \
--master-username masteruser \
--master-user-password masterpassword \
--cluster-type multi-node \
--number-of-nodes 2

Deleting a Cluster

aws redshift delete-cluster \

--cluster-identifier my-redshift-cluster \
--skip-final-cluster-snapshot

Describing a Cluster
aws redshift describe-clusters \
--cluster-identifier my-redshift-cluster

Database Management
Connecting to the Database
Use a PostgreSQL-compatible tool such as psql or a SQL client:

psql -h my-cluster.cduijjmc4xkx.us-west-2.redshift.amazonaws.com -U masteruser

Creating a Database
https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 8/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

CREATE DATABASE mydb;

Dropping a Database

DROP DATABASE mydb;

User Management
Creating a User
CREATE USER myuser WITH PASSWORD 'mypassword' ;

Dropping a User

DROP USER myuser;

Granting Permissions

GRANT ALL PRIVILEGES ON DATABASE mydb TO myuser;

Revoking Permissions

REVOKE ALL PRIVILEGES ON DATABASE mydb FROM myuser;

Table Management

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 9/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Creating a Table

CREATE TABLE mytable (

id INT PRIMARY KEY,
name VARCHAR(50),
age INT
);

Dropping a Table

DROP TABLE mytable;

Inserting Data

INSERT INTO mytable (id, name, age) VALUES (1, 'John Doe', 30);

Updating Data

UPDATE mytable SET age = 31 WHERE id = 1;

Deleting Data

DELETE FROM mytable WHERE id = 1;

Querying Data

SELECT * FROM mytable;

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 10/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Performance Tuning
Analyzing a Table

ANALYZE mytable;

Vacuuming a Table

VACUUM mytable;

Redshift Distribution Styles

KEY: Distributes rows based on the values in one column.

EVEN: Distributes rows evenly across all nodes.

ALL: Copies the entire table to each node.

Example: Creating a Table with Distribution Key

CREATE TABLE mytable (

id INT,
name VARCHAR(50),
age INT
)
DISTSTYLE KEY
DISTKEY(id);

Backup and Restore

Creating a Snapshot

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 11/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

aws redshift create-cluster-snapshot \

--snapshot-identifier my-snapshot \
--cluster-identifier my-redshift-cluster

Restoring from a Snapshot

aws redshift restore-from-cluster-snapshot \

--snapshot-identifier my-snapshot \
--cluster-identifier my-new-cluster

Security
Enabling SSL
In psql or your SQL client, use the sslmode parameter:

psql "host=my-cluster.cduijjmc4xkx.us-west-2.redshift.amazonaws.com dbname=dev

Managing VPC Security Groups

aws redshift create-cluster-security-group --cluster-security-group-name my-sec

aws redshift authorize-cluster-security-group-ingress --cluster-security-group-

Maintenance
Resizing a Cluster

aws redshift modify-cluster \

--cluster-identifier my-redshift-cluster \
https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 12/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

--node-type dc2.large \
--number-of-nodes 4

Monitoring Cluster Performance

Use Amazon CloudWatch to monitor:

CPU Utilization

Database Connections

Read/Write IOPS

Network Traffic

Viewing Cluster Events

aws redshift describe-events \

--source-identifier my-redshift-cluster \
--source-type cluster

Amazon S3 Cheat Sheet

Overview
Amazon S3 (Simple Storage Service) stores data as objects within buckets.
Each object includes a file and optional metadata that describes the file. A
key is a unique identifier for an object within a bucket, and storage capacity
is virtually unlimited.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 13/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Buckets
Access Control: For each bucket, you can control access, create,
delete, and list objects, view access logs, and choose the
geographical region for storage.

Naming: Bucket names must be unique DNS-compliant names

across all existing S3 buckets. Once created, the name cannot be
changed and is visible in the URL pointing to the objects in the
bucket.

Limits: By default, you can create up to 100 buckets per AWS

account. The region of a bucket cannot be changed after creation.

Static Website Hosting: Buckets can be configured to host static

websites.

Deletion Restrictions: Buckets with 100,000 or more objects

cannot be deleted via the S3 console. Buckets with versioning
enabled cannot be deleted via the AWS CLI.

Data Consistency Model

Read-After-Write Consistency: For PUTS of new objects in all
regions.

Strong Consistency: For read-after-write HEAD or GET requests,

overwrite PUTS, and DELETES in all regions.

Eventual Consistency: For listing all buckets after deletion and for
enabling versioning on a bucket for the first time.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 14/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Storage Classes
Frequently Accessed Objects
S3 Standard: General-purpose storage for frequently accessed
data.

S3 Express One Zone: High-performance, single-AZ storage class

for latency-sensitive applications, offering improved access speeds
and reduced request costs compared to S3 Standard.

Infrequently Accessed Objects

S3 Standard-IA: For long-lived but less frequently accessed data,
with redundant storage across multiple AZs.

S3 One Zone-IA: Less expensive, stores data in one AZ, and is not
resilient to AZ loss. Suitable for objects over 128 KB stored for at
least 30 days.

Amazon S3 Intelligent-Tiering
Automatic Cost Optimization: Moves data between frequent and
infrequent access tiers based on access patterns.

Monitoring: Moves objects to infrequent access after 30 days

without access, and to archive tiers after 90 and 180 days without
access.

No Retrieval Fees: Optimizes costs without performance impact.

S3 Glacier
https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 15/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Long-Term Archive: Provides storage classes like Glacier Instant

Retrieval, Glacier Flexible Retrieval, and Glacier Deep Archive for
long-term archiving.

Access: Archived objects must be restored before access and are

only visible through S3.

Retrieval Options
Expedited: Access data within 1-5 minutes for urgent requests.

Standard: Default option, typically completes within 3-5 hours.

Bulk: Lowest-cost option for retrieving large amounts of data,

typically completes within 5-12 hours.

Additional Information
Object Storage: For S3 Standard, Standard-IA, and Glacier classes,
objects are stored across multiple devices in at least three AZs.

Amazon Athena Cheat Sheet

Overview
Amazon Athena is an interactive query service that allows you to analyze
data directly in Amazon S3 and other data sources using SQL. It is serverless

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 16/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

and uses Presto, an open-source, distributed SQL query engine optimized

for low-latency, ad hoc analysis.

Features
Serverless: No infrastructure to manage.

Built-in Query Editor: Allows you to write and execute queries

directly in the Athena console.

Wide Data Format Support: Supports formats such as CSV, JSON,

ORC, Avro, and Parquet.

Parallel Query Execution: Executes queries in parallel to provide

fast results, even for large datasets.

Amazon S3 Integration: Uses S3 as the underlying data store,

ensuring high availability and durability.

Data Visualization: Integrates with Amazon QuickSight.

AWS Glue Integration: Works seamlessly with AWS Glue for data
cataloging.

Managed Data Catalog: Stores metadata and schemas for your S3-
stored data.

Queries
Geospatial Data: You can query geospatial data.

Log Data: Supports querying various log types.

Query Results: Results are stored in S3.

Query History: Retains history for 45 days.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 17/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

User-Defined Functions (UDFs): Supports scalar UDFs, executed

with AWS Lambda, to process records or groups of records.

Data Types: Supports both simple (e.g., INTEGER, DOUBLE,

VARCHAR) and complex (e.g., MAPS, ARRAY, STRUCT) data types.

Requester Pays Buckets: Supports querying data in S3 Requester

Pays buckets.

Athena Federated Queries

Data Connectors: Allows querying data sources beyond S3 using
data connectors implemented in Lambda functions via the Athena
Query Federation SDK.

Pre-built Connectors: Available for popular data sources like

MySQL, PostgreSQL, Oracle, SQL Server, DynamoDB, MSK,
RedShift, OpenSearch, CloudWatch Logs, CloudWatch metrics, and
DocumentDB.

Custom Connectors: You can write custom data connectors or

customize pre-built ones using the Athena Query Federation SDK.

Optimizing Query Performance

Data Partitioning: Partitioning data by column values (e.g., date,
country, region) reduces the amount of data scanned by a query.

Columnar Formats: Converting data to columnar formats like

Parquet and ORC improves performance.

File Compression: Compressing files reduces the amount of data

scanned.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 18/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Splittable Files: Using splittable files allows Athena to read them in

parallel, speeding up query completion. Formats like AVRO,
Parquet, and ORC are splittable, regardless of the compression
codec. Only text files compressed with BZIP2 and LZO are
splittable.

Cost Controls
Workgroups: Isolate queries by teams, applications, or workloads
and enforce cost controls.

Per-Query Limit: Sets a threshold for the total amount of data

scanned per query, canceling any query that exceeds this limit.

Per-Workgroup Limit: Limits the total amount of data scanned by

all queries within a specified timeframe, with multiple limits based
on hourly or daily data scan totals.

Amazon Athena Security

Access Control: Use IAM policies, access control lists, and S3 bucket
policies to control data access.

Encrypted Data: Queries can be performed directly on encrypted

data in S3.

Amazon Athena Pricing

Pay Per Query: Charged based on the amount of data scanned by
each query.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 19/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

No Charge for Failed Queries: You are not charged for queries that
fail.

Cost Savings: Compressing, partitioning, or converting data to

columnar formats reduces the amount of data scanned, leading to
cost savings and performance gains.

Amazon Kinesis Cheat Sheet

Overview
Amazon Kinesis makes it easy to collect, process, and analyze real-time
streaming data. It can ingest real-time data such as video, audio, application
logs, website clickstreams, and IoT telemetry data for machine learning,
analytics, and other applications.

Kinesis Video Streams

A fully managed service for streaming live video from devices to the AWS
Cloud or building applications for real-time video processing or batch-
oriented video analytics.

Benefits
Device Connectivity: Connect and stream from millions of devices.

Custom Retention Periods: Configure video streams to durably

store media data for custom retention periods, generating an index
based on timestamps.
https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 20/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Serverless: No infrastructure setup or management required.

Security: Enforces TLS-based encryption for data streaming and

encrypts all data at rest using AWS KMS.

Components
Producer: Source that puts data into a Kinesis video stream.

Kinesis Video Stream: Enables the transportation, optional

storage, and real-time or batch consumption of live video data.

Consumer: Retrieves data from a Kinesis video stream to view,

process, or analyze it.

Fragment: A self-contained sequence of frames with no

dependencies on other fragments.

Video Playbacks
HLS (HTTP Live Streaming): For live playback.

GetMedia API: For building custom applications to process video

streams in real time with low latency.

Metadata
Nonpersistent Metadata: Ad hoc metadata for specific fragments.

Persistent Metadata: Metadata for consecutive fragments.

Pricing
Pay for the volume of data ingested, stored, and consumed.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 21/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Kinesis Data Stream

A scalable, durable data ingestion and processing service optimized for
streaming data.

Components
Data Producer: Application emitting data records to a Kinesis data
stream, assigning partition keys to records.

Data Consumer: Application or AWS service retrieving data from all

shards in a stream for real-time analytics or processing.

Data Stream: A logical grouping of shards retaining data for 24

hours or up to 7 days with extended retention.

Shard: The base throughput unit, ingesting up to 1000 records or 1

MB per second. Provides ordered records by arrival time.

Data Record
Record: Unit of data in a stream with a sequence number, partition
key, and data blob (max 1 MB).

Partition Key: Identifier (e.g., user ID, timestamp) used to route

records to shards.

Sequence Number
Unique identifier for each data record, assigned by Kinesis when
data is added.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 22/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Monitoring
Monitor shard-level metrics using CloudWatch, Kinesis Agent, and
Kinesis libraries. Log API calls with CloudTrail.

Security
Automatically encrypt sensitive data with AWS KMS.

Use IAM for access control and VPC endpoints to keep traffic within
the Amazon network.

Pricing
Charged per shard hour, PUT Payload Unit, and enhanced fan-out
usage. Extended data retention incurs additional charges.

Kinesis Data Firehose

The easiest way to load streaming data into data stores and analytics tools.

Features
Scalable: Automatically scales to match data throughput.

Data Transformation: Can batch, compress, and encrypt data

before loading it.

Destination Support: Captures, transforms, and loads data into S3,

Redshift, Elasticsearch, HTTP endpoints, and service providers like
Datadog, New Relic, MongoDB, and Splunk.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 23/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Batch Size and Interval: Control data upload frequency and size.

Data Delivery and Transformation

Lambda Integration: Transforms incoming data before delivery.

Format Conversion: Converts JSON to Parquet or ORC for storage

in S3.

Buffer Configuration: Controls data buffering before delivery to

destinations.

Pricing
Pay for the volume of data transmitted. Additional charges for data
format conversion.

Kinesis Data Analytics

Analyze streaming data, gain insights, and respond to business needs in real
time.

General Features
Serverless: Automatically manages infrastructure.

Scalable: Elastically scales to handle data volume.

Low Latency: Provides sub-second processing latencies.

SQL Features

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 24/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Standard ANSI SQL: Integrates with Kinesis Data Streams and

Firehose.

Input Types: Supports streaming and reference data sources.

Schema Editor: Recognizes standard formats like JSON and CSV.

Java Features
Apache Flink: Uses open-source libraries for building streaming
applications.

State Management: Stores state in encrypted, incrementally saved

running application storage.

Exactly Once Processing: Ensures processed records affect results

exactly once.

Components
Input: Streaming source for the application.

Application Code: SQL statements processing input data.

In-Application Streams: Stores data for processing.

Kinesis Processing Units (KPU): Provides memory, computing, and

networking resources.

Pricing
Charged based on the number of KPUs used. Additional charges for
Java application orchestration and storage.

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 25/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Member discussion 0 comments

Start the conversation

Become a member of Nata in Data to start commenting.

Already a member? Sign in

Python for Data Best Practices to Get

Engineering Started with Data
How I use Python as a Data engineer: Python Observability + Hands-On
plays a vital role in my daily work as a data… Examples
Jan 15, 2025 2 min read ! Get your Data Observability checklist in the
end ! DATA DOWNTIME - words that send…

Aug 12, 2024 6 min read

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 26/27
2/22/25, 7:28 PM AWS Data Engineering Cheatsheet

Sign up Roadmap About Home

https://www.nataindata.com/blog/aws-data-engineering-cheat-sheet/ 27/27

Effective Pandas. Patterns For Data Manipulation (Treading On Python) - Matt Harrison - Independently Published (2021)
100% (13)
Effective Pandas. Patterns For Data Manipulation (Treading On Python) - Matt Harrison - Independently Published (2021)
392 pages
AWS Cloud Practitioner Full Course
86% (14)
AWS Cloud Practitioner Full Course
246 pages
Microsoft Power BI Cookbook by Greg Deckler
100% (19)
Microsoft Power BI Cookbook by Greg Deckler
655 pages
PCEP Student Book
No ratings yet
PCEP Student Book
184 pages
The Pragmatic Programmer PDF
100% (53)
The Pragmatic Programmer PDF
352 pages
Python in Excel (2024)
100% (10)
Python in Excel (2024)
607 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (21)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Python Programming & SQL
100% (4)
Python Programming & SQL
152 pages
The Python Bible
97% (31)
The Python Bible
506 pages
Oracle 1Z0-770 v2025-02-12 q51
No ratings yet
Oracle 1Z0-770 v2025-02-12 q51
29 pages
S. Haines - Modern Data Engineering With Apache Spark - A Hands-On Guide For Building Mission-Critical Streaming Applications (2022) - Libgen - Li
50% (4)
S. Haines - Modern Data Engineering With Apache Spark - A Hands-On Guide For Building Mission-Critical Streaming Applications (2022) - Libgen - Li
592 pages
Understanding Machine Learning
100% (69)
Understanding Machine Learning
416 pages
OpenJS Node.js Application Developer (JSNAD) Certification Guide
From Everand
OpenJS Node.js Application Developer (JSNAD) Certification Guide
Liora Venith
No ratings yet
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (18)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
100% (14)
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
210 pages
Apache Kafka Tutorial
100% (3)
Apache Kafka Tutorial
61 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
93% (15)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Azure Databricks Course Slide Deck
75% (4)
Azure Databricks Course Slide Deck
169 pages
Python Cheat Sheet: Ata Tructures
100% (12)
Python Cheat Sheet: Ata Tructures
2 pages
Root A. Python For Data Analytics. A Beginners Guide For Learning 2019
100% (8)
Root A. Python For Data Analytics. A Beginners Guide For Learning 2019
167 pages
Python Programming Using Problem Solving
100% (7)
Python Programming Using Problem Solving
646 pages
Machine Learning With Python
100% (14)
Machine Learning With Python
692 pages
Top 100 Applications of Generative AI 1683282083
100% (15)
Top 100 Applications of Generative AI 1683282083
119 pages
Mastering SaltStack - Second Edition
From Everand
Mastering SaltStack - Second Edition
Joseph Hall
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Etl With Azure Cookbook Practical Recipes For Building Modern Etl Solutions To Load and Transform Data From Any Source 1800203314 9781800203310
100% (7)
Etl With Azure Cookbook Practical Recipes For Building Modern Etl Solutions To Load and Transform Data From Any Source 1800203314 9781800203310
446 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
Configurable Material and Material Variants
100% (1)
Configurable Material and Material Variants
8 pages
AWS SysOps Administrator Associate: From basic to advanced
From Everand
AWS SysOps Administrator Associate: From basic to advanced
Alex Carvalho
No ratings yet
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Learning SaltStack - Second Edition
From Everand
Learning SaltStack - Second Edition
Colton Myers
No ratings yet
LPI Security Essentials Study Guide: Exam 020-100
From Everand
LPI Security Essentials Study Guide: Exam 020-100
David Clinton
No ratings yet
Citrix® XenApp® 7.x Performance Essentials
From Everand
Citrix® XenApp® 7.x Performance Essentials
Luca Dentella
No ratings yet
Learn Java from Scratch: A Practical Guide with Examples
From Everand
Learn Java from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
Expert Strategies in Apache Spark: Comprehensive Data Processing and Advanced Analytics
From Everand
Expert Strategies in Apache Spark: Comprehensive Data Processing and Advanced Analytics
Adam Jones
No ratings yet
Oracle 11g For Dummies
From Everand
Oracle 11g For Dummies
Chris Zeis
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
Azure DevOps Engineer: Designing and Implementing Microsoft DevOps Solutions
From Everand
Azure DevOps Engineer: Designing and Implementing Microsoft DevOps Solutions
Rob Botwright
No ratings yet
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
From Everand
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
equitypress
No ratings yet
Instant Play Framework Starter
From Everand
Instant Play Framework Starter
Daniel Dietrich
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Cloud: Get All The Support And Guidance You Need To Be A Success At Using The CLOUD
From Everand
Cloud: Get All The Support And Guidance You Need To Be A Success At Using The CLOUD
John Hawkins
No ratings yet
Google Cloud Platform Complete Self-Assessment Guide
From Everand
Google Cloud Platform Complete Self-Assessment Guide
Gerardus Blokdyk
1/5 (1)
Android Studio 3.2 Development Essentials - Android 9 Edition: Developing Android 9 Apps Using Android Studio 3.2, Java and Android Jetpack
From Everand
Android Studio 3.2 Development Essentials - Android 9 Edition: Developing Android 9 Apps Using Android Studio 3.2, Java and Android Jetpack
Neil Smyth
No ratings yet
Spring Boot 3.0 Crash Course
From Everand
Spring Boot 3.0 Crash Course
Kit Harrington
No ratings yet
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
Ultimate Pentesting for Web Applications: Unlock Advanced Web App Security Through Penetration Testing Using Burp Suite, Zap Proxy, Fiddler, Charles Proxy, and Python for Robust Defense (English Edition)
From Everand
Ultimate Pentesting for Web Applications: Unlock Advanced Web App Security Through Penetration Testing Using Burp Suite, Zap Proxy, Fiddler, Charles Proxy, and Python for Robust Defense (English Edition)
Dr. Rohit Gautam
No ratings yet
Spring MVC Blueprints
From Everand
Spring MVC Blueprints
Sherwin John Calleja Tragura
No ratings yet
IBM WebSphere Application Server v7.0 Security
From Everand
IBM WebSphere Application Server v7.0 Security
Omar Siliceo
No ratings yet
Finding the Best IT Job in the Boston Area
From Everand
Finding the Best IT Job in the Boston Area
Michael Moshe
No ratings yet
RHCSA Exam Pass: Red Hat Certified System Administrator Study Guide
From Everand
RHCSA Exam Pass: Red Hat Certified System Administrator Study Guide
Rob Botwright
No ratings yet
Ultimate Microsoft Intune for Administrators: Master Enterprise Endpoint Security and Manage Devices, Apps, and Cloud Security with Expert Microsoft Intune Strategies (English Edition)
From Everand
Ultimate Microsoft Intune for Administrators: Master Enterprise Endpoint Security and Manage Devices, Apps, and Cloud Security with Expert Microsoft Intune Strategies (English Edition)
Paul Winstanley
No ratings yet
Kubernetes Secrets Handbook: Design, implement, and maintain production-grade Kubernetes Secrets management solutions
From Everand
Kubernetes Secrets Handbook: Design, implement, and maintain production-grade Kubernetes Secrets management solutions
Emmanouil Gkatziouras
No ratings yet
Net Developer's Interview Toolkit: Dot Net Interview Preparation, #3
From Everand
Net Developer's Interview Toolkit: Dot Net Interview Preparation, #3
Nirbhay Chauhan
No ratings yet
OpenProject The Ultimate Step-By-Step Guide
From Everand
OpenProject The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Modernizing Legacy Applications in PHP
From Everand
Modernizing Legacy Applications in PHP
Paul M. Jones
No ratings yet
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
From Everand
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
Venkata Sasi Kanumuri
No ratings yet
VMware Horizon View Essentials
From Everand
VMware Horizon View Essentials
Peter von Oven
No ratings yet
Master C# Interview Preparation: Dot Net Interview Preparation, #2
From Everand
Master C# Interview Preparation: Dot Net Interview Preparation, #2
Nirbhay Chauhan
No ratings yet
Mastering The Spritekit Framework: Develop Professional Games With This New Ios 7 Framework
From Everand
Mastering The Spritekit Framework: Develop Professional Games With This New Ios 7 Framework
Peter van de Put
No ratings yet
Data Engineering 101 Redshift
No ratings yet
Data Engineering 101 Redshift
65 pages
Amazon Red Shift
No ratings yet
Amazon Red Shift
54 pages
Migrate Your On-Premise Data Warehouse To Amazon Redshift: Noman Jaffery
100% (1)
Migrate Your On-Premise Data Warehouse To Amazon Redshift: Noman Jaffery
18 pages
Getting Started With Amazon Redshift
No ratings yet
Getting Started With Amazon Redshift
51 pages
Amazon Redshift - Analyze Data Across Your Lake House With Amazon Redshift
No ratings yet
Amazon Redshift - Analyze Data Across Your Lake House With Amazon Redshift
48 pages
Amazon AWS Redshift Overview
No ratings yet
Amazon AWS Redshift Overview
3 pages
Amazon Redshift论文
No ratings yet
Amazon Redshift论文
13 pages
Data Warehouse
No ratings yet
Data Warehouse
42 pages
AWS Certified Solutions Architect - Associate Exam Prep kit
From Everand
AWS Certified Solutions Architect - Associate Exam Prep kit
SUJAN
No ratings yet
Aws (S3, Iam, Ec2, Emr and Redshift)
100% (1)
Aws (S3, Iam, Ec2, Emr and Redshift)
16 pages
Amazon Redhsift
No ratings yet
Amazon Redhsift
25 pages
Amazon Redshift
No ratings yet
Amazon Redshift
5 pages
Deep Dive and Best Practices For Amazon Redshift ANT418
100% (1)
Deep Dive and Best Practices For Amazon Redshift ANT418
85 pages
Deep Dive On AWS Redshift
67% (3)
Deep Dive On AWS Redshift
73 pages
AWS Cloud Practitioner: From Basic to Advanced
From Everand
AWS Cloud Practitioner: From Basic to Advanced
Alex Carvalho
No ratings yet
Amazon Redshift: Database - PRN NO-2017BTECS00041
No ratings yet
Amazon Redshift: Database - PRN NO-2017BTECS00041
9 pages
Amazon Redshift Best Practices
No ratings yet
Amazon Redshift Best Practices
47 pages
Redshift Interview Guide!
No ratings yet
Redshift Interview Guide!
21 pages
Cheat Sheets - 4
No ratings yet
Cheat Sheets - 4
10 pages
Handout Accelerate Your Analytics and AI With Amazon SageMaker Lakehouse
No ratings yet
Handout Accelerate Your Analytics and AI With Amazon SageMaker Lakehouse
45 pages
BDA305 NEW LAUNCH! Intro To Amazon Redshift Spectrum - Now Query Exabytes of Data in S3!1!20
No ratings yet
BDA305 NEW LAUNCH! Intro To Amazon Redshift Spectrum - Now Query Exabytes of Data in S3!1!20
20 pages
AWS Associate Architect: From basic to advanced
From Everand
AWS Associate Architect: From basic to advanced
Alex Carvalho
No ratings yet
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
No ratings yet
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
8 pages
AWS Data Lake
No ratings yet
AWS Data Lake
13 pages
Data Analysis With PANDAS: Cheat Sheet
86% (7)
Data Analysis With PANDAS: Cheat Sheet
4 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
Muhammad Yasoob Ullah Khalid - Practical Python Projects-Muhammad Yasoob Ullah Khalid (2021)
100% (3)
Muhammad Yasoob Ullah Khalid - Practical Python Projects-Muhammad Yasoob Ullah Khalid (2021)
329 pages
Microsoft Passleader Ai-900 PDF
No ratings yet
Microsoft Passleader Ai-900 PDF
7 pages
Vector Search Theoritical Notes With Keywords
No ratings yet
Vector Search Theoritical Notes With Keywords
36 pages
Juassic Park Final Essay
No ratings yet
Juassic Park Final Essay
2 pages
Unit 8B Test
No ratings yet
Unit 8B Test
4 pages
Biligual Education
No ratings yet
Biligual Education
2 pages
CV Europass 20191113 Nicoleta EN PDF
No ratings yet
CV Europass 20191113 Nicoleta EN PDF
3 pages
RICHARD MILLER-The-Structure-of-Singing-171-260 - 9-9
0% (1)
RICHARD MILLER-The-Structure-of-Singing-171-260 - 9-9
1 page
Sumit Paper - Securing Wireless Network Using Penetration Testing and Counter Measures
No ratings yet
Sumit Paper - Securing Wireless Network Using Penetration Testing and Counter Measures
7 pages
QPaper Framing Guidelines As Per BTL
No ratings yet
QPaper Framing Guidelines As Per BTL
5 pages
Question Paper June 2024 (H44602)
No ratings yet
Question Paper June 2024 (H44602)
32 pages
Everyday Public Speaking
No ratings yet
Everyday Public Speaking
7 pages
Computer 111 History of Computers
No ratings yet
Computer 111 History of Computers
7 pages
EMV
No ratings yet
EMV
17 pages
Focus3 2E Vocabulary Quiz Unit3 GroupA
No ratings yet
Focus3 2E Vocabulary Quiz Unit3 GroupA
1 page
DLL - Math-8 Quarter-1 Week-4
No ratings yet
DLL - Math-8 Quarter-1 Week-4
7 pages
Complex Number & Quadratic Equation DPP 02 (MIP) Sachin Sir (JEE Crash Course) - DPP - 2 - Maths - MIP
No ratings yet
Complex Number & Quadratic Equation DPP 02 (MIP) Sachin Sir (JEE Crash Course) - DPP - 2 - Maths - MIP
2 pages
@bsjsjdjsxjs Disha AFCAT Topic-Wise Solved Papers
No ratings yet
@bsjsjdjsxjs Disha AFCAT Topic-Wise Solved Papers
220 pages
General Revision
No ratings yet
General Revision
18 pages
Writing Good News Business Letters
No ratings yet
Writing Good News Business Letters
16 pages
Unscramble The Paragraph. Place Them in The Correct Part
No ratings yet
Unscramble The Paragraph. Place Them in The Correct Part
3 pages
Cepat Menjawab Soal Bahasa Inggris
No ratings yet
Cepat Menjawab Soal Bahasa Inggris
5 pages
HDP To CDP DC In-Place Upgrade
No ratings yet
HDP To CDP DC In-Place Upgrade
76 pages
WS3 IDM WhatIsTheIFCModel
No ratings yet
WS3 IDM WhatIsTheIFCModel
54 pages
DS 2 815 2021-2022 24
No ratings yet
DS 2 815 2021-2022 24
3 pages
English Worksheet 1st - A Kite and Sundari
100% (1)
English Worksheet 1st - A Kite and Sundari
4 pages
Au L 1677907516 May Daily English Questions For Years 5 6 - Ver - 1
No ratings yet
Au L 1677907516 May Daily English Questions For Years 5 6 - Ver - 1
24 pages
Marquez
No ratings yet
Marquez
31 pages
Behind Hymn3
No ratings yet
Behind Hymn3
1 page
Anh 2
No ratings yet
Anh 2
7 pages
Industrial Organization Markets and Strategies 2024 Scribd Download
100% (1)
Industrial Organization Markets and Strategies 2024 Scribd Download
24 pages
Da 02
No ratings yet
Da 02
18 pages