0% found this document useful (0 votes)

2 views

Data analysis with R

The capstone project focuses on collecting and analyzing real-world datasets to enhance data quality and insights through various stages including data collection, wrangling, exploratory analysis, modeling, and dashboard creation using R. Key methodologies involve using libraries such as Tidyverse and ggplot2 for data visualization, SQL for exploratory data analysis, and building predictive models with Tidymodels. The project culminates in an interactive R Shiny dashboard that displays predictive analytics related to bike-sharing demand in Seoul.

Uploaded by

Hoài An Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Data analysis with R

Uploaded by

Hoài An Nguyen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Applied Data

Science with R
Capstone project
<Nguyen Hoai An>
<October 15th 2024>
Outline
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix

2
Executive Summary
Ø The project aims to collect and analyze real-world datasets through
various stages, enhancing data quality and gaining insights.
Ø The project involves tackling a challenge that requires data
collection, analysis, hypothesis testing, visualization, modeling, and
dashboard creation using real-world datasets.
Ø Key tasks include:
Ø Data Collection: Gathering and understanding data from multiple sources.
Ø Data Wrangling: Preparing data using regular expressions and Tidyverse.
Ø Exploratory Data Analysis: Utilizing SQL and visualization techniques via
Tidyverse and ggplot2.
Ø Modeling: Building linear regression models using Tidymodels.
Ø Dashboard Creation: Developing an interactive dashboard with R Shiny. 3
Introduction
•Module 3 - Performing Exploratory Data Analysis
with SQL, Tidyverse & ggplot2
•Module 1 - Capstone Overview and Data Collection
•EDA with SQL lab using RSQLite
•DC with Web Scraping Notebook
Hands-on Lab •EDA with SQL lab using RODBC with IBM DB2
•DC with OpenWeather API Notebook
•EDA with Data Visualization Lab
•Module 2 - Data Wrangling (DW)
•Module 4 - Predictive Analysis
•DW with Regular Expressions Notebook
•Building a Baseline Regression Model Lab
•DW with dplyr Notebook
•Improving the Linear Model lab

•Module 5 - Building a R Shiny Dashboard App

•Build a bike-sharing demand prediction app

•Module 6 - Present Your Data-Driven Insights

4
Methodology
• Perform data collection
• Perform data wrangling
• Perform exploratory data analysis (EDA) using SQL and
visualization
• Perform predictive analysis using regression models
• How to build the baseline model
• How to improve the baseline model
• Build a R Shiny dashboard app

5
Methodology

6
Data collection
use the ‘rvest’ library to obtain HTML
table from a web page, Collect Data from an API using httr and jsonlite

library(rvest) # Load required libraries

url <- "https://example.com" library(httr)
root_node <- read_html(url) library(jsonlite)
table_nodes <- html_nodes(root_node, "table") url <- "https://api.example.com/data" # Replace with actual API URL
df<- html_table(table_nodes[[1]], fill = TRUE) api_key <- "YOUR_API_KEY" # Replace with your actual API key
write.csv(df, "data", row.names = FALSE) data_query <- list(
q = "query_term", # Replace with actual query term or data filter
appid = api_key,# Use the API key as a query parameter
units = ”unit" ) # Replace with the appropriate unit system (optional)
# Obtain CSV from a URL response <- GET(url, query = data_query)
json_result <- content(response, as = "parsed", type = "application/json")
data <- data.frame(
url <- "https://example.com/data.csv"
column_1 = json_result$main$field1,# Replace with actual data fields
download.file(url, destfile = "data.csv")
column_2 = json_result$main$field2,
column_3 = json_result$field3 )
write.csv(data, file = "data.csv", row.names = FALSE)
Data collection Web Scraping Notebook

library(rvest)
1 use the ‘rvest’ library to obtain HTML table
from a web page,
url <- "https://example.com"
root_node <- read_html(url)
table_nodes <- html_nodes(root_node, "table")

2 convert the table into a data frame, df<- html_table(table_nodes[[1]], fill = TRUE)

3 summarize the data frame glimpse(df)

4 write the data frame to a csv file. write.csv(df, ”File_name", row.names = FALSE)
API request
1

Data collection
(‘httr’ library )

Parsing JSON Response

2
with OpenWeather API Notebook (‘jsonlite’ library)

3 Extracting and Storing Data

4 Fetching Data

5 Creating a Data Frame

6 Displaying the Data Frame

7 Saving to CSV
9
Data wrangling

data manipulation, cleaning, and transformation

workflow with:
• ‘tidyverse’ library:
• readr
• dplyr
• stringr
• tidyr
• ‘fastDummies’ package

10
Data wrangling

Standardize column names Normalize data

Summarize the class Create indicator (dummy)

of each column variables for categorical variables

cleaning up the values

Detect and handle missing values
in the web-scraped dataset

11
Data wrangling
# List of datasets
dataset_list <- c('data_1.csv ', 'data_2.csv’)

# Load necessary libraries

library(readr)
library(stringr)# for str_replace_all function

# Loop through each dataset

for (dataset_name in dataset_list){
# Read dataset without column specification messages
dataset <- read_csv(dataset_name, show_col_types = FALSE)

# Convert to uppercase and replace spaces with underscores

Standardize column names colnames(dataset) <- str_replace_all(toupper(colnames(dataset)), " ", "_")

# Save the standardized dataset

write.csv(dataset, dataset_name, row.names = FALSE)
}
12
Data wrangling
# Load necessary libraries
library(readr)
library(dplyr)
library(tidyr)

# Read the dataset

data_df <- read_csv(”data_1.csv", show_col_types = FALSE)

# Select specific columns

df <- data_df %>% select(column1, column2, column3)

# Summarize the class of each column and gather the results

df %>%
Summarize the class of each column summarize_all(class) %>%
gather(variable, class)

13
# check if the column contains any “strange” character
ref_pattern <- "\\[[A-z0-9]+\\]"
Data wrangling find_ref_pattern <- function(strings) grepl(ref_pattern, strings)
df %>%
cleaning up the values select(column_2) %>%
in the web-scraped dataset filter(find_ref_pattern(column_2)) %>%
slice(0:10)

# check if the column is purely numeric remove_ref <- function(strings) {

find_character <- function(strings) grepl("[^0-9]", strings) ref_pattern <- "\\[[A-z0-9]+\\]"
# Replace all matched substrings using str_replace_all()
df %>% result <- str_replace_all(strings, ref_pattern, "")
select(column_1) %>% # Trim the result to remove any extra spaces
filter(find_character(column_1)) %>% result <- str_trim(result)
slice(0:10) return(result) }

# Clean and replace all non-numeric characters # Apply the remove_ref function to the column_2 and column_3
df <- df %>% df <- df %>%
mutate(column_1 = str_replace_all(column_1, "[^0-9]", "")) mutate(column_2 = remove_ref(column_2),
column_3 = remove_ref(column_3) )

# Then check the result # Then check the result

14
Data wrangling
Detect and handle missing values

# Take a quick look at the dataset

summary(data_df)

# subset the NA values in the column. # Impute missing values for column_X with mean value
library(dplyr) library(dplyr)
data_df <- data_df %>% data_df <- data_df %>%
filter(!is.na(column_X)) mutate(column_X = ifelse(is.na(column_X), Mean_value, column_X))

15
EDA with SQL
library("RSQLite") Group_1 <- dbGetQuery(con, "
db_path <- ”dbname.sqlite" Run RSQLite and SELECT group_column, Grouping and
establish connection. Aggregating Data
con <- dbConnect(RSQLite::SQLite(), dbname = db_path) COUNT(*) AS total_records,
AVG(numeric_column) AS average_value
library(readr) FROM table_name
Load Data into GROUP BY group_column")
dbWriteTable(con, "table_name", read_csv(”File.csv",
Database
show_col_types = FALSE), overwrite = TRUE)
F_data <- dbGetQuery(con, "
T_count <- dbGetQuery(con, "SELECT COUNT(*) AS
Counting Records SELECT * Data Filtering
total_records FROM table_name ")
FROM table_name
WHERE condition1 AND condition2 AND ...")
T_value <- dbGetQuery(con, "SELECT
Summing a Column
SUM(column_name) AS total_value FROM table_name")
Trend_data <- dbGetQuery(con, "
A_value <- dbGetQuery(con, "SELECT AVG(column_name) Detecting Trends
SELECT time_column,
Finding Averages Over Time
AS average_value FROM table_name") COUNT(*) AS total_records,
AVG(numeric_column) AS average_value
Min_max <- dbGetQuery(con, "SELECT FROM table_name
Finding Minimum GROUP BY time_column
MIN(column_name) AS min_value, MAX(column_name)
and Maximum Values
AS max_value FROM table_name") ORDER BY time_column")
16
EDA with SQL
Season_1 <- dbGetQuery(con, " Group_1 <- dbGetQuery(con, "
Seasonality
SELECT season_column, SELECT * Outlier Detection
Patterns
COUNT(*) AS total_records, FROM table_name
AVG(numeric_column) AS average_value WHERE numeric_column > (SELECT AVG(numeric_column) +
FROM table_name 2 * STDDEV(numeric_column) FROM table_name)
GROUP BY season_column OR numeric_column < (SELECT AVG(numeric_column) - 2
ORDER BY season_column") * STDDEV(numeric_column) FROM table_name)")

F_data <- dbGetQuery(con, " Finding

SELECT group_column, Similarities
AVG(numeric_column) AS average_value, Between Groups
COUNT(*) AS total_records
FROM table_name Trend_data <- dbGetQuery(con, "
GROUP BY group_column SELECT group_column, Clustering and
ORDER BY average_value DESC") Similarity
AVG(numeric_column) AS average_value,
COUNT(*) AS total_records
FROM table_name
GROUP BY group_column
HAVING AVG(numeric_column) BETWEEN some_value AND another_value
ORDER BY average_value")
17
EDA with data visualization
library(tidyverse)
Use tidyverse and ggplot2 in R.
library(ggplot2)

data_frame %>% ggplot(aes(x = numeric_column)) + geom_histogram(binwidth = some_value,

Create Histograms. fill = "blue", color = "black") + labs(title = "Histogram of numeric_column",
x = "numeric_column", y = "Frequency") + theme_minimal()

data_frame %>%
ggplot(aes(x = numeric_column1, y = numeric_column2)) +
Generate Scatterplots. geom_point(color = "blue", size = 2) +
labs(title = "Scatterplot of numeric_column1 vs numeric_column2", x = "numeric_column1",
y = "numeric_column2") + theme_minimal()

data_frame %>%
ggplot(aes(x = categorical_column, y = numeric_column)) +
Employ Box Plots. geom_boxplot(fill = "blue", color = "black") +
labs(title = "Box Plot of numeric_column by categorical_column", x = "categorical_column",
y = "numeric_column") + theme_minimal()
18
Predictive analysis

Define Objective

Prepare Data
•collect predictors and
•target variable)

Build Initial Models (linear regression)

Identify Key Predictors (analyze coefficients)

Add Polynomial and Interaction Terms

Manage Complexity and Overfitting

Apply Regularization (e.g., Lasso or Ridge)

Evaluate Models (MSE, RMSE, R-squared)

Refine Models (adjust terms, compare)

Final Model Selection

19
Build a R Shiny dashboard

• Integrate Regression Models (Predict hourly demand using weather, date, and time data)
• Display Interactive Map (Leaflet map showing cities with predicted bike demand for the next five days)
• Enable User Interaction - Dropdown to select specific city or "All" for overview
• Generate Detailed Plots - ggplot to show demand trends for selected city, including temperature and
humidity
• Visualize Data Trends - Line charts for temperature and demand over 5 days - Scatterplot for demand vs.
humidity correlation

20
Results
• Exploratory data analysis results

• Predictive analysis results

• A dashboard demo in screenshots

21
EDA with SQL

22
Busiest bike rental times

• The result is a data frame displaying the top 10 instances of

bike rentals from the SEOUL_BIKE_SHARING table, focusing on
the highest rental counts with DATE and HOUR.

• The data indicates a consistent trend of high rentals during the

same hour (18:00) across multiple days in June and September
2018, suggesting that this time slot is particularly popular for
bike rentals.

23
Hourly popularity and temperature
by seasons
To find hourly popularity and temperature by season:

• the data frame retrieves the average temperature and average bike
rentals for each season and hour from the SEOUL_BIKE_SHARING
table.

• It groups the data by both SEASONS and HOUR, then orders the
results by the average bike rentals in descending order.

• The top 10 results are stored in the variable avg_hourly_temp_bikes.

• The table shows that the highest average bike rentals occur during the
summer months, particularly in the late afternoon and early evening
hours. The average temperature during these peak rental hours is
generally warm. This suggests that people in Seoul tend to use bike-
sharing services more frequently on warm evenings.
24
Rental Seasonality
• The result retrieves seasonal bike rental
statistics from the SEOUL_BIKE_SHARING
table with average, minimum, maximum,
and standard deviation of bike rentals for
each season.
• Overall, this data frame provides insights
into seasonal patterns in bike rentals,
highlighting that Summer not only has the
highest average rentals but also the
greatest variability, while Winter shows
lower averages and more consistency in
rental counts.

25
Weather Seasonality

• The result retrieves weather seasonal bike rental statistics from the SEOUL_BIKE_SHARING table with averages for rented bike
count and weather conditions such as temperature, humidity, wind speed, visibility, dew point temperature, solar radiation,
rainfall, and snowfall for each season.

• Overall, this data highlights seasonal trends in weather conditions and bike usage, showing that warmer seasons correlate with
higher bike rentals and more favorable weather conditions.

26
Bike-sharing info in Seoul

• The query joins two tables: WORLD_CITIES and

BIKE_SHARING_SYSTEMS to get information about
bike-sharing in Seoul.

• The data frame includes the city name, country,

latitude, longitude, population, and finally
indicates that there are 20,000 bikes in the bike-
sharing system in Seoul, South Korea.

27
Cities similar to Seoul
• The code retrieves data about cities that have
number of bicycles is between 15,000 and
20,000 from the WORLD_CITIES and
BIKE_SHARING_SYSTEMS tables.
• A join operation is performed on the city
names to combine information from both
tables.
• The result provides insights into cities with
moderate bike-sharing systems, particularly in
China, indicating their geographical
coordinates and population size alongside the
number of bicycles available.

28
EDA with Visualization

29
Bike rental
vs. Date

The plot shows rented bike counts over time, with denser
clusters of points toward the middle of 2018, indicating higher
bike rental activity around Summer and Autumn. The counts
drop off at both ends (early and late 2018), which could
suggest lower rentals in colder months or off-peak times.

30
Bike rental vs. Datetime

• This plot helps to observe both seasonal and

hourly trends in bike rentals.
• It reveals a consistent pattern of high rentals
during the same hour (18:00) throughout the
year 2018, suggesting that this time slot is
particularly popular for bike rentals.

31
Bike rental
histogram

• This plot helps to observe bike rentals with an overlayed

kernel density curve.
• It reveals both the discrete and continuous distributions
of bike rental counts.
• The distribution is right-skewed, meaning there are more
days with lower rented bike counts compared to higher
counts.
• The peak of the distribution is around 500 rented bikes,
suggesting that this is the most common number of bikes
rented in a day.
• There are some outliers on the right side of the plot,
representing days with exceptionally high numbers of
rented bikes. These might be due to special events,
holidays, or other factors that increase demand for bike
rentals.

32
Daily total rainfall and snowfall

The plot provides a clear comparison of daily total rainfall and snowfall over a specific period in 2018.
Rainfall was the predominant form of precipitation, with higher frequency and intensity during the warmer months (April to October).
Snowfall appears to be concentrated in the colder months, primarily in January.

33
Predictive analysis

34
Ranked
coefficients

• Based on the chart, the most important factors

influencing bike-sharing demand are weather-related
variables like RAINFALL, HUMIDITY, DEW POINT
TEMPERATURE, and TEMPERATURE. These variables seem
to have a stronger impact on bike-sharing usage
compared to seasonal variations, holidays, or other
environmental factors like SOLAR RADIATION, SNOWFALL,
VISIBILITY, and WIND SPEED.

• In the HOUR variable, we can see which times have a high

correlation with bike-sharing demand, suggesting that
these time slots are particularly popular for bike rentals.

35
Model
evaluation
Built at least 5 different models using
polynomial terms, interaction terms, and
regularizations

36
Find the best performing model
• Best model based on RMSE - 308.3473: Lasso Model
• Best model based on R-squared - 0.7628875: Lasso Model
• formula <- RENTED_BIKE_COUNT ~ poly(TEMPERATURE, 6) +
poly(DEW_POINT_TEMPERATURE, 6) + SUMMER +
poly(SOLAR_RADIATION, 6) + H_18 + poly(VISIBILITY, 3) + AUTUMN +
H_19 + H_17 + poly(WIND_SPEED, 5) + H_20 + H_21 + H_8 + H_16 +
H_22 + NO_HOLIDAY + H_15 + H_14 + SPRING + H_13 + H_12 + H_23 +
H_9 + H_7 + H_11 + H_0 + H_10 + HOLIDAY + H_1 + poly(RAINFALL, 6) +
H_2 + H_6 + poly(SNOWFALL, 6) + H_3 + H_5 + H_4 + poly(HUMIDITY, 5) +
WINTER

37
Q-Q plot of the
best model

38
Dashboard

39
Max bike prediction overview map

Bike-sharing Demand Prediction App -

Clearly states the title of the application.
World Map: Depicts the globe, providing a
visual context for the data being displayed.
City Markers predict the demand in bike
rentals.
Weather Indicators: indicating the prediction
weather condition.
City Selector: Allows users to select a specific
city for detailed analysis, options include
"All" (for a global view).

40
Prediction bike-sharing demand in London
• Map: Shows the location of London.
• City Markers (green) predict the
small demand in bike rentals.
• Weather information: prediction
weather conditions in London.
• Temperature chart: Shows the
temperature in London over time.
• Bike Count Prediction Chart:
Displays the predicted number of
bikes in London for the next three
hours.
• Time and Bike Count Prediction:
Indicates the time of access and the
corresponding predicted bike
demand.
• Bike Prediction Chart : Shows the
relationship between bike
predictions and humidity levels.

41
Prediction bike-sharing demand in Seoul
• Map: Shows the location of Seoul.
• City Markers (yellow) predict the
medium demand in bike rentals.
• Weather information: prediction
weather conditions in Seoul.
• Temperature chart: Shows the
temperature in Seoul over time.
• Bike Count Prediction Chart: Displays
the predicted number of bikes in
Seoul for the next three hours.
• Time and Bike Count Prediction:
Indicates the time of access and the
corresponding predicted bike
demand.
• Bike Prediction Chart : Shows the
relationship between bike predictions
and humidity levels.
42
Prediction bike-sharing demand in Suzhou
• Map: Shows the location of Suzhou.
• City Markers (yellow) predict the
medium demand in bike rentals.
• Weather information: prediction
weather conditions in Suzhou.
• Temperature chart: Shows the
temperature in Suzhou over time.
• Bike Count Prediction Chart: Displays
the predicted number of bikes in
Suzhou for the next three hours.
• Time and Bike Count Prediction:
Indicates the time of access and the
corresponding predicted bike
demand.
• Bike Prediction Chart : Shows the
relationship between bike predictions
and humidity levels.
43
Prediction bike-sharing demand in New York
• Map: Shows the location of New York.
• City Markers (yellow) predict the
medium demand in bike rentals.
• Weather information: prediction
weather conditions in New York.
• Temperature chart: Shows the
temperature in New York over time.
• Bike Count Prediction Chart: Displays
the predicted number of bikes in
New York for the next three hours.
• Time and Bike Count Prediction:
Indicates the time of access and the
corresponding predicted bike
demand.
• Bike Prediction Chart : Shows the
relationship between bike predictions
and humidity levels.
44
Prediction bike-sharing demand in Paris
• Map: Shows the location of Paris.
• City Markers (yellow) predict the
medium demand in bike rentals.
• Weather information: prediction
weather conditions in Paris.
• Temperature chart: Shows the
temperature in Paris over time.
• Bike Count Prediction Chart: Displays
the predicted number of bikes in
Paris for the next three hours.
• Time and Bike Count Prediction:
Indicates the time of access and the
corresponding predicted bike
demand.
• Bike Prediction Chart : Shows the
relationship between bike predictions
and humidity levels.
45
CONCLUSION
• The comprehensive workflow undertaken in this project highlights the critical stages of data
handling, modeling, and visualization, ultimately aimed at predicting bike demand based on
weather and time factors.
• The project has been utilizing SQL and visualization tools like Tidyverse and ggplot2 enabled a
thorough exploration of the data, facilitating insights into patterns and trends that inform modeling
efforts.
• The project not only performs building a linear regression model along with polynomial and
regularized models using Tidymodels illustrated the iterative process of model refinement to
identify the best-performing approach for predicting bike demand.
• The R Shiny application integrates regression models for hourly bike demand predictions, featuring
an interactive map and detailed visualizations to enhance user engagement and insights into
demand trends and variable relationships, such as weather, date, time and humidity.

46
APPENDIX 1. Data Collection
Web Scraping Notebook with OpenWeather API Notebook

47
APPENDIX 2. Data wrangling

48
APPENDIX 2. Data wrangling

49
APPENDIX 2. Data wrangling

50
APPENDIX 2. Data wrangling

51
APPENDIX 3. EDA with SQL

52
APPENDIX 3. EDA with SQL

53
APPENDIX 3. EDA with SQL

54
APPENDIX 3. EDA with data visualization

55
APPENDIX 3. EDA with data visualization

56
APPENDIX 3. EDA with data visualization

57
APPENDIX 3. EDA with data visualization

58
APPENDIX 3. EDA with data visualization

59
APPENDIX 4.
Predictive
analysis

60
APPENDIX 4.
Predictive
analysis

61
APPENDIX 4.
Predictive
analysis

62
APPENDIX 4.
Predictive
analysis

63
APPENDIX 4.
Predictive
analysis

64
APPENDIX 4.
Predictive
analysis

65
APPENDIX 4.
Predictive
analysis

66
APPENDIX 5. Build a R Shiny dashboard
APPENDIX 5. Build a R Shiny dashboard
Prediction bike-sharing demand in London

68
APPENDIX 5. Build a R Shiny dashboard
Prediction bike-sharing demand in Seoul

69
APPENDIX 5. Build a R Shiny dashboard
Prediction bike-sharing demand in Suzhou

70
APPENDIX 5. Build a R Shiny dashboard
Prediction bike-sharing demand in New York

71
APPENDIX 5. Build a R Shiny dashboard
Prediction bike-sharing demand in Paris

IBM Data Science Capstone
89% (9)
IBM Data Science Capstone
51 pages
Case Study - Patent On Neem
100% (5)
Case Study - Patent On Neem
12 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
07 - Acquire and Access Data Using NoSQL Database
No ratings yet
07 - Acquire and Access Data Using NoSQL Database
26 pages
Capstone Story Presentation
No ratings yet
Capstone Story Presentation
21 pages
U-2 NoSql -QA
No ratings yet
U-2 NoSql -QA
13 pages
T09 Sparksql
No ratings yet
T09 Sparksql
30 pages
2023-IDA Custom Bootcamp Curriculum Day Wise Curriculum v0.1
No ratings yet
2023-IDA Custom Bootcamp Curriculum Day Wise Curriculum v0.1
122 pages
IBM Data Science Capstone
No ratings yet
IBM Data Science Capstone
51 pages
4. Spring - Data - Mongo-REST
No ratings yet
4. Spring - Data - Mongo-REST
61 pages
Unit-6
No ratings yet
Unit-6
20 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
Anthropic-cookbook:Skills:Contextual-embeddings:Guide.ipynb at Main · Anthropics
No ratings yet
Anthropic-cookbook:Skills:Contextual-embeddings:Guide.ipynb at Main · Anthropics
21 pages
5Cosc020W Database Systems - Lecture 05
No ratings yet
5Cosc020W Database Systems - Lecture 05
39 pages
Vb-Unit-5-NOTES
No ratings yet
Vb-Unit-5-NOTES
6 pages
Unit 5
No ratings yet
Unit 5
21 pages
Hibernate Interview Questions
100% (1)
Hibernate Interview Questions
14 pages
11 Data Access Part I-1
No ratings yet
11 Data Access Part I-1
58 pages
db JDBC
No ratings yet
db JDBC
17 pages
Venkateshwaran Gopal: Professional
No ratings yet
Venkateshwaran Gopal: Professional
5 pages
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
No ratings yet
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
35 pages
Data Engineering Nanodegree Program Syllabus PDF
No ratings yet
Data Engineering Nanodegree Program Syllabus PDF
5 pages
JDBC
No ratings yet
JDBC
26 pages
21 5COSC020W LECT04 SQL Simple
No ratings yet
21 5COSC020W LECT04 SQL Simple
39 pages
4.7_DatabaseConnections
No ratings yet
4.7_DatabaseConnections
19 pages
43 - Web DB
No ratings yet
43 - Web DB
30 pages
Informatica Intelligent Cloud Data Quality
No ratings yet
Informatica Intelligent Cloud Data Quality
3 pages
Modelling: SAP BW / BI (7.1 & 7.3) Training Course Contents
No ratings yet
Modelling: SAP BW / BI (7.1 & 7.3) Training Course Contents
14 pages
Databricks: Building and Operating A Big Data Service Based On Apache Spark
No ratings yet
Databricks: Building and Operating A Big Data Service Based On Apache Spark
32 pages
Server-Side Web Programming
No ratings yet
Server-Side Web Programming
18 pages
07 Spark Dataframes
100% (1)
07 Spark Dataframes
45 pages
Spark SQL_updated
No ratings yet
Spark SQL_updated
19 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
25 pages
Lecture 7 Working With Pandas (1)
No ratings yet
Lecture 7 Working With Pandas (1)
15 pages
DP 203t00a Enu Powerpoint 03
No ratings yet
DP 203t00a Enu Powerpoint 03
25 pages
Oop-G-7
No ratings yet
Oop-G-7
36 pages
Advanced Project For Data Engineering in Azure
100% (1)
Advanced Project For Data Engineering in Azure
5 pages
Big Data Testing
100% (1)
Big Data Testing
34 pages
bdbjavaparking-2541909
No ratings yet
bdbjavaparking-2541909
20 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Python Record Manual
No ratings yet
Python Record Manual
18 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
5 pages
Web Tech Apps June 2015
No ratings yet
Web Tech Apps June 2015
26 pages
NO SQL
No ratings yet
NO SQL
13 pages
UNIT 5 - NOSQL DB (1)
No ratings yet
UNIT 5 - NOSQL DB (1)
13 pages
Basic Data Prep and Pre-Processing (2)
No ratings yet
Basic Data Prep and Pre-Processing (2)
12 pages
Devinder Gill - DE - Resume
No ratings yet
Devinder Gill - DE - Resume
5 pages
DP900 NOTES Parti 1 - 40mn Vidéo
No ratings yet
DP900 NOTES Parti 1 - 40mn Vidéo
11 pages
Oracle Berkeley DB Data Store: A Use-Case Based Tutorial
No ratings yet
Oracle Berkeley DB Data Store: A Use-Case Based Tutorial
68 pages
Experiment No 08,09,10 Data Analytics
No ratings yet
Experiment No 08,09,10 Data Analytics
5 pages
Chapter 5 (Data Access Methods)
No ratings yet
Chapter 5 (Data Access Methods)
27 pages
Programming in C#
No ratings yet
Programming in C#
69 pages
Database Reporting
No ratings yet
Database Reporting
48 pages
Seminar 100326 A
No ratings yet
Seminar 100326 A
46 pages
data_analytics_chapter_5
No ratings yet
data_analytics_chapter_5
14 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
12 pages
LA Android Unit 4
No ratings yet
LA Android Unit 4
16 pages
Big Data Analytics
From Everand
Big Data Analytics
Venkat Ankam
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Owners Manual: Point-of-Use Drinking Water System
No ratings yet
Owners Manual: Point-of-Use Drinking Water System
12 pages
RESEARCHM
No ratings yet
RESEARCHM
3 pages
PNSMV046
No ratings yet
PNSMV046
8 pages
Admission Circular For MDS-Professional 2024-25
No ratings yet
Admission Circular For MDS-Professional 2024-25
3 pages
English Words by Learner Level A1
No ratings yet
English Words by Learner Level A1
21 pages
Suffixes - Prípony: Abstraktní Výrazy
No ratings yet
Suffixes - Prípony: Abstraktní Výrazy
16 pages
Activity 3 (Frequency Distribution Table) 1st Sem 2021
No ratings yet
Activity 3 (Frequency Distribution Table) 1st Sem 2021
1 page
Service Manual: Stereo Power Amplifier
No ratings yet
Service Manual: Stereo Power Amplifier
31 pages
Transformation - Transfer Function State Space
No ratings yet
Transformation - Transfer Function State Space
9 pages
Diagramas DD13 DD15
100% (8)
Diagramas DD13 DD15
480 pages
Law & Practices of Banking
No ratings yet
Law & Practices of Banking
2 pages
The Reproductive System: I. Objectives (What Will Your Students Be Able To Do by The End of The Class)
No ratings yet
The Reproductive System: I. Objectives (What Will Your Students Be Able To Do by The End of The Class)
3 pages
Unit 8: Honey, You're Sleep-Walking Again
No ratings yet
Unit 8: Honey, You're Sleep-Walking Again
8 pages
Mathematics 10 Quarter 3 - Module 4: Problem Solving in Permutations and Combinations
No ratings yet
Mathematics 10 Quarter 3 - Module 4: Problem Solving in Permutations and Combinations
5 pages
Revision in Service Charges On Inland Services Other Than Advance 24-25
No ratings yet
Revision in Service Charges On Inland Services Other Than Advance 24-25
14 pages
Equipment Registration Form
No ratings yet
Equipment Registration Form
1 page
Differential Equations - Vol1 - Worksheet 3 Separation of Variables
No ratings yet
Differential Equations - Vol1 - Worksheet 3 Separation of Variables
44 pages
Eia - 1007 Full Study-Kamulu Nock Petrol Station
No ratings yet
Eia - 1007 Full Study-Kamulu Nock Petrol Station
72 pages
Fruit Serving Calculation Purees
No ratings yet
Fruit Serving Calculation Purees
2 pages
SDV1042-600 1 Ets
No ratings yet
SDV1042-600 1 Ets
5 pages
ITT Evaluation Matrix
No ratings yet
ITT Evaluation Matrix
18 pages
ISO Thesis Defense Forms ME UG Thesis
No ratings yet
ISO Thesis Defense Forms ME UG Thesis
2 pages
Of Indian Standard Specifications On Dimensional Metrology: 1. Precision Measuring Equipment
No ratings yet
Of Indian Standard Specifications On Dimensional Metrology: 1. Precision Measuring Equipment
2 pages
Space Frame
No ratings yet
Space Frame
49 pages
IntegerProgramming S1 2023
No ratings yet
IntegerProgramming S1 2023
67 pages
PM 150 CSD 120
No ratings yet
PM 150 CSD 120
7 pages
KPS Inst Manual Ver 6 (1) .1 Eng
No ratings yet
KPS Inst Manual Ver 6 (1) .1 Eng
56 pages
Ba Economics-Principles of Macro Economics 111
No ratings yet
Ba Economics-Principles of Macro Economics 111
20 pages
Milestone Trend Analysis (MTA)
No ratings yet
Milestone Trend Analysis (MTA)
2 pages