0% found this document useful (0 votes)

16 views23 pages

Abhishek BDA File

Good for big data analysis

Uploaded by

Curious Nation

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views23 pages

Abhishek BDA File

Good for big data analysis

Uploaded by

Curious Nation

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Name: Abhishek Chauhan

Branch: CSE
Semester: VIII
Roll No.: 23005
Certificate

Certified that this Practical entitled “Big Data Analysis” submitted by Abhishek Chauhan, Roll No. 23005,
student of Computer Science Engineering Department, Dronacharya College of Engineering,
Gurugram in the partial fulfillment of the requirement for the award of Bachelors of
Technology (Computer Science and Engineering) Degree of MDU, Rohtak is a record of
student’s own study carried under my supervision & guidance.

Submitted To: HOD:

Prof. Pooja Khot Dr. Ashima Mehta

Sign.: ……………………. Sign.: ………………….

LIST OF EXPERIMENTS
S.No Experiment Sign

1 Write a Big data program to implement Python program that demonstrates a basic
process of extracting value from big data using a machine learning model.

2 Program that involves working with Big Data on a distributed computing platform
like Apache Spark and comparing it with a Relational Database Management
System (RDBMS)
3 Program for a Big Data Data Lakes scenario involves working with a distributed
storage system and possibly using a query language like SQL or tools like Apache
Spark. Below is a simplified example using Python, PySpark, and the Spark SQL
API to demonstrate working with data stored in a Data Lake.

4 Python program using PySpark, which is the Python API for Apache Spark, to
demonstrate basic data processing:
5 Python program using SQLAlchemy, a popular SQL toolkit and Object-Relational
Mapping (ORM) library, to model data and interact with a relational database.

6 Python program that demonstrates basic data operations using the panda library.
Pandas is a popular library for data manipulation and analysis in Python.

7 Write Program Python and PySpark to demonstrate data ingestion into Hadoop
Distributed File System (HDFS) and Apache Kafka.

8 Program Real-life applications of big data span across various industries,

addressing challenges related to large-scale data processing, analysis, and
extraction of valuable insight
9 Python program that simulates a basic big data processing scenario using Apache
Spark for querying data.Large dataset (assuming a CSV file for simplicity),
performs some data preprocessing, and then executes SQL-like queries on the data
using Spark SQL.
10 Python program using Apache Beam, a popular open-source data processing SDK, to
create a basic data pipeline.
11 Python program that demonstrates basic analytical operations using Apache Spark.
This example uses PySpark to perform analytics on a large dataset.
LAB EXPERIMENT 1
OBJECTIVE: Write a Big data program to implement Python program that demonstrates a

basic process of extracting value from big data using a machine learning model.

PRE-EXPERIMENT QUESTIONS:

1. What is method Big data program to implement ?

2. What are the requirements for method big data using a machine learning model. ?

BRIEF DISCUSSION AND EXPLANATION:

# Import necessary libraries

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

# Step 1: Load and preprocess big data

# Assuming 'big_data.csv' is a large dataset with features and target variable

data = pd.read_csv('big_data.csv')

# Preprocess data as needed (handle missing values, encode categorical variables, etc.)

# For simplicity, let's assume the target variable is binary (0 or 1)

# Step 2: Split data into training and testing sets

X = data.drop('target_variable', axis=1) # Features

y = data['target_variable'] # Target variable

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train a machine learning model (Random Forest in this example)

model = RandomForestClassifier(n_estimators=100, random_state=42)

model.fit(X_train, y_train)

# Step 4: Make predictions on the test set

y_pred = model.predict(X_test)

# Step 5: Evaluate the model's performance

accuracy = accuracy_score(y_test, y_pred)

print(f'Model Accuracy: {accuracy * 100:.2f}%')

# Step 6: Extract value based on model predictions

# Depending on your business case, you might use the model predictions to make decisions or take

actions.

# Additional steps for deployment, monitoring, and continuous improvement would be necessary in a

real-world scenario.

Output:

print(f'Model Accuracy: {accuracy * 100:.2f}%')

Model Accuracy: 85.00%

LAB EXPERIMENT 2

OBJECTIVE: Program that involves working with Big Data on a distributed computing
platform like Apache Spark and comparing it with a Relational Database Management System
(RDBMS)
PRE-EXPERIMENT QUESTIONS:

3. What is method Big data program to implement ?

4. What are the requirements for method big data using a machine learning model. ?
BRIEF DISCUSSION AND EXPLANATION

from pyspark.sql import SparkSession

# Create a Spark session
spark = SparkSession.builder.appName("BigDataExample").getOrCreate()
# Read a large dataset (assuming a Parquet file for illustration)
big_data_df = spark.read.parquet("big_data.parquet")
# Perform a simple data transformation
processed_data = big_data_df.filter(big_data_df['column_name'] > 50)
# Show the result
processed_data.show()
# Stop the Spark session
spark.stop()
import sqlite3
# Create a SQLite connection
conn = sqlite3.connect('database.db')
cursor = conn.cursor()
# Assuming you have a large table named 'big_data_table' with a column 'column_name'
# Execute a SQL query to process the data
cursor.execute("SELECT * FROM big_data_table WHERE column_name > 50")
# Fetch and print the result
result = cursor.fetchall()
for row in result:
print(row)
# Close the database connection
conn.close()
LAB EXPERIMENT 3

OBJECTIVE: Program for a Big Data Data Lakes scenario involves working with a distributed

storage system and possibly using a query language like SQL or tools like Apache Spark. Below is a

simplified example using Python, PySpark, and the Spark SQL API to demonstrate working with data

stored in a Data Lake.

PRE-EXPERIMENT QUESTIONS:

1. What Data Lakes?

2. What is PySpark, and the Spark SQL API?

BRIEF DISCUSSION AND EXPLANATION:

from pyspark.sql import SparkSession

# Create a Spark session

spark = SparkSession.builder.appName("DataLakeExample").getOrCreate()

# Read data from a file in a Data Lake (Assuming a parquet file for illustration)

data_lake_df = spark.read.parquet("data_lake_file.parquet")

# Perform a basic data analysis using Spark SQL

data_lake_df.createOrReplaceTempView("data_lake_table")

result = spark.sql("SELECT column1, AVG(column2) as avg_column2 FROM

data_lake_table GROUP BY column1")

# Show the result

result.show()

# Write the result back to the Data Lake (Assuming a parquet file for illustration)
result.write.parquet("output_result.parquet")

# Stop the Spark session

spark.stop()

In this example:

• data_lake_file.parquet represents a file stored in a Data Lake, and the program

reads the data from it.
• The program then uses Spark SQL to perform a basic analysis (calculating the
average of column2 grouped by column1).
• The result is shown using the show() method.
• The processed data is written back to the Data Lake, represented by the
output_result.parquet file.

POST EXPERIMENT QUESTIONS:

1. What is the output of the program? Explain the Python, PySpark, and the Spark?
2. What is Big Data Data Lakes scenario?
LAB EXPERIMENT 4

OBJECTIVE: Program using PySpark, which is the Python API for Apache Spark, to demonstrate
basic data processing:
PRE-EXPERIMENT QUESTIONS:

1. What is Python API ?

2. Explain basic data Processing?
BRIEF DISCUSSION AND EXPLANATION:
from pyspark.sql import SparkSession
# Create a Spark session
spark = SparkSession.builder.appName("BigDataProcessingExample").getOrCreate()
# Read a large dataset (assuming a CSV file for illustration)
big_data_df =spark.read.csv("big_data.csv", header=True, inferSchema=True)
# Perform a simple data transformation (example: calculate the sum of a column)
total_sum = big_data_df.agg({"column_name": "sum"}).collect()[0][0]
# Show the result
print(f"Total sum of 'column_name': {total_sum}")
# Stop the Spark session
spark.stop()

Output: In this example:

• big_data.csv represents a large dataset, and the program reads the data using PySpark.
• The program performs a basic data transformation, calculating the sum of a column
('column_name' in this example).
• The result is then printed.

Please note that you would need to replace "big_data.csv" and "column_name" with your actual
dataset and column name. Additionally, in a real-world scenario, you would likely perform more
complex transformations and analyses on the data.

The output of this program would be the calculated sum of the specified column, as indicated by the
print statement. This is a basic example, and actual processing tasks would depend on the specific
requirements and nature of your big data.

POST EXPERIMENT QUESTIONS:

1, What is CSV?

2. Explain Data transformation?

LAB EXPERIMENT 5

OBJECTIVE: Python program using SQLAlchemy, a popular SQL toolkit and Object-Relational

Mapping (ORM) library, to model data and interact with a relational database.

PRE-EXPERIMENT QUESTIONS:

1. What is Sql Toolkit and object-Oriented Mapping?

2. What Object-Relational Mapping (ORM) library, to model data and interact with a

relational database.?

BRIEF DISCUSSION AND EXPLANATION:

from sqlalchemy import create_engine, Column, Integer, String, MetaData

from sqlalchemy.orm import declarative_base, Session

# Define the database connection

DATABASE_URL = "sqlite:///example.db"

engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})

# Define the data model using SQLAlchemy ORM

Base = declarative_base()

class BigDataModel(Base):

tablename = "big_data_table"

id = Column(Integer, primary_key=True, index=True)

column1 = Column(String, index=True)

column2 = Column(Integer)

# Create the database tables

Base.metadata.create_all(bind=engine)
# Create a session to interact with the database

db_session = Session(engine)

# Example: Inserting data into the database

new_data_entry = BigDataModel(column1="example_data", column2=42)

db_session.add(new_data_entry)

db_session.commit()

# Example: Querying data from the database

queried_data = db_session.query(BigDataModel).filter(BigDataModel.column1 ==

"example_data").first()

print("Queried Data:", queried_data. dict )

# Close the database session

db_session.close()

POST EXPERIMENT QUESTIONS:

1. How to do new data entry into the database and query?

2. Explain in Details of BigDataModel class using SQLAlchemy?

LAB EXPERIMENT 6

OBJECTIVE: Python program that demonstrates basic data operations using the pandas

library. Pandas is a popular library for data manipulation and analysis in Python.

PRE-EXPERIMENT QUESTIONS:

1. What are data Operations?

2. What is the data Manipulation?

BRIEF DISCUSSION AND EXPLANATION:

import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 22, 35, 28],
'Salary': [50000, 60000, 45000, 70000, 55000]}
df = pd.DataFrame(data)
# Display the original DataFrame
print("Original DataFrame:")
print(df)
print("\n")
# Data Operations:
# 1. Selecting columns
selected_columns = df[['Name', 'Age']]
print("Selected Columns:")
print(selected_columns)

print("\n")

# 2. Filtering data based on a condition

filtered_data = df[df['Age'] > 25]

print("Filtered Data (Age > 25):")

print(filtered_data)
print("\n")
# 3. Sorting by a column
sorted_data = df.sort_values(by='Salary', ascending=False)
print("Sorted Data (by Salary):")
print(sorted_data)
print("\n")
# 4. Grouping data and calculating aggregates
grouped_data = df.groupby('Age').mean()
print("Grouped Data (Average Salary by Age):")
print(grouped_data)
print("\n")
# 5. Adding a new column
df['Bonus'] = df['Salary'] * 0.1
print("DataFrame with Bonus Column:")
print(df)
print("\n")
# 6. Deleting a column
df = df.drop('Bonus', axis=1)
print("DataFrame after removing the Bonus column:")
print(df)
print("\n")

# 7. Renaming columns

df = df.rename(columns={'Age': 'Years'})

print("DataFrame after renaming the Age column to Years:")

print(df)

Make sure to install the pandas library before running this program if you haven't
already:

POST EXPERIMENT QUESTIONS:

1. What is Pandas Library?
2. What is Data Frame?
LAB EXPERIMENT 7

OBJECTIVE: Write Program Python and PySpark to demonstrate data ingestion into Hadoop

Distributed File System (HDFS) and Apache Kafka.

PRE-EXPERIMENT QUESTIONS:

1. What is a Data ingestion?

2. What are the common operations in data ingestion?

BRIEF DISCUSSION AND EXPLANATION:

from pyspark.sql import SparkSession

# Create a Spark session

spark = SparkSession.builder.appName("DataIngestionToHDFS").getOrCreate()

# Read data from a CSV file (assuming it's a large dataset)

input_file_path = "your_large_dataset.csv"

data_df = spark.read.csv(input_file_path, header=True, inferSchema=True)

# Write the data to HDFS (assuming it's running on localhost)

hdfs_output_path = "hdfs://localhost:9000/user/your_username/your_output_path"

data_df.write.mode("overwrite").parquet(hdfs_output_path)

# Stop the Spark session

spark.stop()

// Ensure you have a running HDFS instance, and adjust the input_file_path and

hdfs_output_path accordingly.

from kafka import KafkaProducer

# Create a Kafka producer

producer = KafkaProducer(bootstrap_servers='localhost:9092')

# Read data from a file (assuming a text file for simplicity)

input_file_path = "your_data_file.txt"

with open(input_file_path, 'r') as file:

for line in file:

# Send each line as a message to the Kafka topic

producer.send('your_kafka_topic', value=line.encode('utf-8'))

# Close the Kafka producer

producer.close()

// Make sure you have a running Kafka broker and adjust the input_file_path and
your_kafka_topic accordingly.

These are basic examples, and in real-world scenarios, you would likely deal with more
complex data formats, configurations, and error handling. Additionally, you may need to
consider tools like Apache NiFi for comprehensive data ingestion pipelines.

Ensure you have the required libraries installed before running the programs:

pip install pyspark kafka-python

POST EXPERIMENT QUESTIONS:

1. What is a kafka?
2. What are the common operations performed on a stack?
LAB EXPERIMENT 8

OBJECTIVE: Program Real-life applications of big data span across various industries, addressing
challenges related to large-scale data processing, analysis, and extraction of valuable insight
PRE-EXPERIMENT QUESTIONS:
1.What is large scale data Processing?
2.Extraction of value insights big Data?
BRIEF DISCUSSION AND EXPLANATION:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
# Load a sample e-commerce dataset (products and user interactions)
ecommerce_data = pd.read_csv("ecommerce_data.csv")
# Perform data preprocessing (cleaning, handling missing values, etc.)
# Create a TF-IDF vectorizer to convert product descriptions into numerical features
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(ecommerce_data['product_description'].fillna(''))
# Calculate the cosine similarity between products based on their descriptions
cosine_similarity = linear_kernel(tfidf_matrix, tfidf_matrix)
# Function to get personalized product recommendations for a given product ID
def get_recommendations(product_id, cosine_sim=cosine_similarity):
idx = ecommerce_data.index[ecommerce_data['product_id'] == product_id].tolist()[0]
sim_scores = list(enumerate(cosine_sim[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:6]
product_indices = [i[0] for i in sim_scores]
return ecommerce_data['product_name'].iloc[product_indices]
# Example: Get recommendations for a specific product (change product_id accordingly)
product_id_to_recommend_for = 12345
recommendations = get_recommendations(product_id_to_recommend_for)
# Display the recommendations
print(f"Top 5 Recommendations for Product ID {product_id_to_recommend_for}:\n")
for i, product_name in enumerate(recommendations, start=1):
print(f"{i}. {product_name}")

POST EXPERIMENT QUESTIONS:

1. Cosine similarity is calculated between products based on their descriptions to identify
similar products?
LAB EXPERIMENT 9

OBJECTIVE: Python program that simulates a basic big data processing scenario using Apache

Spark for querying data.Large dataset (assuming a CSV file for simplicity), performs some data

preprocessing, and then executes SQL-like queries on the data using Spark SQL.

PRE-EXPERIMENT QUESTIONS:

1. Explain basic big data processing scenario?

2. How Apache Spark for querying data works?

BRIEF DISCUSSION AND EXPLANATION:

from pyspark.sql import SparkSession

# Create a Spark session

spark = SparkSession.builder.appName("BigDataQueryExample").getOrCreate()

# Read a large dataset (assuming a CSV file for illustration)

big_data_df = spark.read.csv("big_data.csv", header=True, inferSchema=True)

# Display the original DataFrame

print("Original DataFrame:")

big_data_df.show(truncate=False)

print("\n")

# Perform data preprocessing (select relevant columns, filter data, etc.)

processed_data = big_data_df.select("column1", "column2").filter(big_data_df["column2"] > 50)

# Display the processed DataFrame

print("Processed DataFrame:")

processed_data.show(truncate=False)

print("\n")

# Execute SQL-like queries on the data using Spark SQL

processed_data.createOrReplaceTempView("processed_data_table")

query_result = spark.sql("SELECT column1, AVG(column2) as avg_column2 FROM

processed_data_table GROUP BY column1")

# Display the result of the query

print("Query Result:")

query_result.show(truncate=False)

# Stop the Spark session

spark.stop()
LAB EXPERIMENT 10

OBJECTIVE: Python program using Apache Beam, a popular open-source data processing SDK, to

create a basic data pipeline.

PRE-EXPERIMENT QUESTIONS:

1. What is Apache beam?

2. Popular open source data processing?

BRIEF DISCUSSION AND EXPLANATION:

import apache_beam as beam

# Sample data (replace this with your actual data source)

data = [

{'user_id': 1, 'event_type': 'click'},

{'user_id': 2, 'event_type': 'purchase'},

{'user_id': 3, 'event_type': 'click'},

# ... more data ...

# Apache Beam pipeline

with beam.Pipeline() as pipeline:

# Step 1: Read data from a source (replace 'data' with your actual data source)

events = pipeline | 'ReadEvents' >> beam.Create(data)

# Step 2: Apply transformations (example: count events by type)

event_counts = (

events

| 'MapEventType' >> beam.Map(lambda event: (event['event_type'], 1))

| 'CountByEventType' >> beam.CombinePerKey(sum)

# Step 3: Write the results to an output (replace 'output' with your actual output destination)

event_counts | 'WriteResults' >> beam.io.WriteToText('output')

The ReadEvents step reads data from a source. Replace 'data' with your actual data

source, such as reading from a file, a database, or a streaming source.

The MapEventType step applies a transformation to convert each event into a key-
value pair with the event type as the key and 1 as the value.
• The CountByEventType step uses CombinePerKey to count the occurrences of each
event type.
• The WriteResults step writes the final results to an output destination. Replace 'output'
with your actual output destination, which could be a file, database, or another storage
system.

POST EXPERIMENT QUESTIONS:

1. How to read file from database?

2. Explain database on other storage system?

LAB EXPERIMENT 11

OBJECTIVE: Python program that demonstrates basic analytical operations using Apache

Spark. This example uses PySpark to perform analytics on a large dataset.

PRE-EXPERIMENT QUESTIONS:

1. what are analytical operations?

2. Explain PySpark to Perform analytical operations?

BRIEF DISCUSSION AND EXPLANATION:

from pyspark.sql import SparkSession

from pyspark.sql.functions import col

# Create a Spark session

spark = SparkSession.builder.appName("BigDataAnalyticsExample").getOrCreate()

# Read a large dataset (assuming a CSV file for illustration)

big_data_df = spark.read.csv("big_data.csv", header=True, inferSchema=True)

# Display the original DataFrame

print("Original DataFrame:")

big_data_df.show(truncate=False)

print("\n")
# Analytical Operations:

# 1. Calculate the total count of rows

total_rows = big_data_df.count()

print(f"Total Rows: {total_rows}\n")

# 2. Calculate the summary statistics for numerical columns

summary_stats = big_data_df.describe().show(truncate=False)

print("Summary Statistics:")

print(summary_stats)

print("\n")

# 3. Group by a categorical column and calculate the average of a numerical column

average_by_category = big_data_df.groupBy("category").agg({"value": "avg"})

print("Average Value by Category:")

average_by_category.show(truncate=False)

print("\n")

# 4. Filter data based on a condition

filtered_data = big_data_df.filter(col("value") > 50)

print("Filtered Data (Value > 50):")

filtered_data.show(truncate=False)

print("\n")

# 5. Calculate the correlation between two numerical columns

correlation = big_data_df.stat.corr("value1", "value2")

print(f"Correlation between Value1 and Value2: {correlation}\n")

# Stop the Spark session

spark.stop()

Discussion:

• - big_data.csv represents a large dataset with various columns.

• The program uses PySpark to perform basic analytical operations, including

calculating the total count of rows, summary statistics, grouping by a

categorical column, filtering data based on a condition, and calculating the

correlation between two numerical columns.

POST EXPERIMENT QUESTIONS:

1. filtering data based on a condition, and calculating the correlation

between two numerical columns?

2. a large dataset with various columns?

Project On Netflix Data Analysis
100% (1)
Project On Netflix Data Analysis
22 pages
Loan Approval Predictor Using Data Science and Machine Learning Project
100% (1)
Loan Approval Predictor Using Data Science and Machine Learning Project
66 pages
Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report
No ratings yet
Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report
6 pages
Athul Dev - Spark With Python (2020) - Libgen - Li
No ratings yet
Athul Dev - Spark With Python (2020) - Libgen - Li
153 pages
DataGrokr Technical Assignment - Data Engineering - Internshala
No ratings yet
DataGrokr Technical Assignment - Data Engineering - Internshala
5 pages
Cse413 201-15-3452 Lab-Report 02
No ratings yet
Cse413 201-15-3452 Lab-Report 02
6 pages
PySpark Course
No ratings yet
PySpark Course
2 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
200 pages
Python BigData Alternative Assignment
No ratings yet
Python BigData Alternative Assignment
5 pages
Exp No. 1-3 (MLC)
No ratings yet
Exp No. 1-3 (MLC)
12 pages
Data Science Toc Srinivas
No ratings yet
Data Science Toc Srinivas
4 pages
Big Data in Python
No ratings yet
Big Data in Python
10 pages
Report Format (1) .Docx - 20240508 - 124537 - 0000
No ratings yet
Report Format (1) .Docx - 20240508 - 124537 - 0000
11 pages
Assignmnet
No ratings yet
Assignmnet
25 pages
Int 421
No ratings yet
Int 421
2 pages
Python Data Analysis Big Data Tools
No ratings yet
Python Data Analysis Big Data Tools
7 pages
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
Detailed Python Data Analysis Big Data Tools
No ratings yet
Detailed Python Data Analysis Big Data Tools
9 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Machine Learning Lab (CIE 421P)
No ratings yet
Machine Learning Lab (CIE 421P)
49 pages
Kavin
No ratings yet
Kavin
13 pages
DataGrokr Technical Assignment - Data Engineering
No ratings yet
DataGrokr Technical Assignment - Data Engineering
4 pages
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
Extract Data From SQL Database
No ratings yet
Extract Data From SQL Database
5 pages
Big Data Analysis
No ratings yet
Big Data Analysis
38 pages
Python and PowerBI Syllabus
No ratings yet
Python and PowerBI Syllabus
3 pages
Slide 10 PySpark - SQL
No ratings yet
Slide 10 PySpark - SQL
131 pages
Bigdata
No ratings yet
Bigdata
3 pages
Big Data-Spark Lab Syllabus
No ratings yet
Big Data-Spark Lab Syllabus
2 pages
Spark & SparkMLLib
No ratings yet
Spark & SparkMLLib
6 pages
2021 Article 9362
No ratings yet
2021 Article 9362
21 pages
A Study of Big Data Analytics Using Apache Spark With Python and Scala
No ratings yet
A Study of Big Data Analytics Using Apache Spark With Python and Scala
8 pages
DS Final
No ratings yet
DS Final
46 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
213j1a05h6 Data Science Cse-F
No ratings yet
213j1a05h6 Data Science Cse-F
25 pages
Practical 1to10
No ratings yet
Practical 1to10
32 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Micro Project Report Format
No ratings yet
Micro Project Report Format
11 pages
Practical Data Science
No ratings yet
Practical Data Science
121 pages
Spark DataFrame Basics
No ratings yet
Spark DataFrame Basics
10 pages
PDS Labmanualword
No ratings yet
PDS Labmanualword
32 pages
ML Lab File
No ratings yet
ML Lab File
33 pages
Report Intership Chapters
No ratings yet
Report Intership Chapters
39 pages
MLlab Manual
No ratings yet
MLlab Manual
36 pages
IML Lab Manual
No ratings yet
IML Lab Manual
31 pages
Gujarat Technological University: Overview of Python and Data Structures
No ratings yet
Gujarat Technological University: Overview of Python and Data Structures
4 pages
Support of Big Data Machine Learning With Apache Spark
No ratings yet
Support of Big Data Machine Learning With Apache Spark
7 pages
Python For Data Science
No ratings yet
Python For Data Science
5 pages
Suraj Report File
No ratings yet
Suraj Report File
17 pages
ML 1
No ratings yet
ML 1
6 pages
Rakshitha.M - 1BO17EC031
No ratings yet
Rakshitha.M - 1BO17EC031
26 pages
Jacky Bai - Pandas Hands-On - Data Analysis Crash Course (2020)
No ratings yet
Jacky Bai - Pandas Hands-On - Data Analysis Crash Course (2020)
139 pages
PM Shri Kendriya Vidyalaya Pattom Shift Ii: Movie Data Analysis
No ratings yet
PM Shri Kendriya Vidyalaya Pattom Shift Ii: Movie Data Analysis
35 pages
Index
No ratings yet
Index
2 pages
Ad-Hoc Network: Unit - 1
No ratings yet
Ad-Hoc Network: Unit - 1
96 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
SE Notes1
No ratings yet
SE Notes1
66 pages
Software Engineering Notes Series - 1
No ratings yet
Software Engineering Notes Series - 1
94 pages
Checklist Internal Audit
0% (1)
Checklist Internal Audit
69 pages
SQL Server Hacking On Scale UsingPowerShell S.sutherland
No ratings yet
SQL Server Hacking On Scale UsingPowerShell S.sutherland
110 pages
AWS Questions
No ratings yet
AWS Questions
5 pages
Rohit Kumar AWS
No ratings yet
Rohit Kumar AWS
1 page
TTDD Routing Protocol PDF
No ratings yet
TTDD Routing Protocol PDF
16 pages
Xcode Private Training Linux Networking
No ratings yet
Xcode Private Training Linux Networking
4 pages
Real Time Selenium
No ratings yet
Real Time Selenium
20 pages
1 Introduction To IBM Cloud Private - IBM Cloud Garage PDF
No ratings yet
1 Introduction To IBM Cloud Private - IBM Cloud Garage PDF
9 pages
TeamViewer12 Manual MSI Deployment en
No ratings yet
TeamViewer12 Manual MSI Deployment en
21 pages
DP 203T00A ENU AssessmentGuide
No ratings yet
DP 203T00A ENU AssessmentGuide
13 pages
LSMW S4Q Ztest
No ratings yet
LSMW S4Q Ztest
4 pages
Bishal Bharati
No ratings yet
Bishal Bharati
13 pages
Slide-5 (AWS - IAM)
No ratings yet
Slide-5 (AWS - IAM)
28 pages
Online Real Estate
No ratings yet
Online Real Estate
13 pages
ProgressProg10 TG 2015 04
No ratings yet
ProgressProg10 TG 2015 04
320 pages
Prep4sure: Review
No ratings yet
Prep4sure: Review
6 pages
ROLAP
No ratings yet
ROLAP
4 pages
Bug Tracking Project Software Requirements Specification Version
No ratings yet
Bug Tracking Project Software Requirements Specification Version
16 pages
Dynamic Systems Development Method
100% (5)
Dynamic Systems Development Method
15 pages
Assignment DBB1105 BBA 1 Set-1 and 2 Nov 2022
No ratings yet
Assignment DBB1105 BBA 1 Set-1 and 2 Nov 2022
19 pages
Cyber Security Handbook
No ratings yet
Cyber Security Handbook
108 pages
FAQs - Arcserve Appliances 9000 Series
No ratings yet
FAQs - Arcserve Appliances 9000 Series
24 pages
Digital Forensic Lab Requirements
No ratings yet
Digital Forensic Lab Requirements
31 pages
How To Modify Scan and SCAN Listener
No ratings yet
How To Modify Scan and SCAN Listener
4 pages
System Backup: IB Computer Science
No ratings yet
System Backup: IB Computer Science
9 pages
BDA Module-2 Notes PDF
100% (1)
BDA Module-2 Notes PDF
14 pages
Term 3 OOP Prelim Lab Exam - Attempt Review1
No ratings yet
Term 3 OOP Prelim Lab Exam - Attempt Review1
5 pages
Unit 5: Integrity and Security: Dhanashree Huddedar
No ratings yet
Unit 5: Integrity and Security: Dhanashree Huddedar
37 pages
Course 3 - Infrastructure and Application Modernization With Google Cloud
No ratings yet
Course 3 - Infrastructure and Application Modernization With Google Cloud
73 pages
Huzefa Khan: 27/2, Bagh Umrao Dulha, Bhopal (M.P.) - 462010
No ratings yet
Huzefa Khan: 27/2, Bagh Umrao Dulha, Bhopal (M.P.) - 462010
2 pages