0% found this document useful (0 votes)

18 views6 pages

Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report

Uploaded by

stkzd8zrqm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views6 pages

Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report

Uploaded by

stkzd8zrqm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Bachelor of Technology in Computer Science

and Engineering
Report
On
Data Analysis in Big Data Using Python

Name Admission No
Tushar Verma 21SCSE1310012

Under the Guidance of Dr. K Suresh Sir

1
Introduction:

Big data analytics is the process of collecting, examining, and analyzing large
amounts of data to discover market trends, insights, and patterns that can help
companies make better business decisions. This information is available
quickly and efficiently so that companies can be agile in crafting plans to
maintain their competitive advantage.

Objective:

Big data analytics describes the process of uncovering trends,

patterns, and correlations in large amounts of raw data to help make
data-informed decisions. These processes use familiar statistical
analysis techniques—like clustering and regression—and apply them
to more extensive datasets with the help of newer tools.

Technologies Used:

Python programming language

PyData library for Data development

Integrated Development Environment (IDE) such as PyCharmor

Jupyter Notebook

Implementation Details:

a. Setting up the Environment:

Install Python and PyData library.
Create a new Python script or project in your preferred IDE.
b. Importing Required Libraries:

Import the necessary libraries, including PyData.

c. Initializing the Data Window:

Set up the Data window dimensions, title, and otherconfigurations.

2
d. Setting up the Data Loop:

Create a Data loop that continuously updates the data stateand redraws the
Data window.
e. Handling Keyboard Inputs:
Capture keyboard inputs to control the Big Data's movement.

Map the arrow keys or WASD keys to specific movementssuch as up,

down, left, and right.
f. Creating the Stack:

Implement the Big Data's initial position, size, and movementlogic.

Define functions to handle the Big Data's movement andgrowth.

g. Generating Food:
Randomly generate food within the Data window. Ensure the food does
not overlap with the Big Data's body.
h. Collision Detection:

Implement collision detection logic to check if the Big Datacollides with

the boundaries or its own body.
End the Data if a collision occurs.
i. Scoring and Data Over:

Keep track of the player's score based on the number of fooditems eaten.
Display the score on the Data window.
End the Data and display a Data over message when theBig Data collides.
j. Adding Data Over Options:

Provide options to restart the Data or exit the applicationafter the Data ends.
Challenges Faced:

3
During the development of the Snack Data, the followingchallenges were
encountered:
Implementing smooth and responsive Big Data movement.
Preventing the Big Data from moving in the opposite directioninstantly,
causing self-collision.
Managing the complexity of collision detection and preventing bugs related
to the Big Data's body and foodplacement.
Conclusion:
The Data analysis project successfully demonstrates the
development of a simple yet engaging Data using Python. Byfollowing the
implementation details outlined in this report, users can create their own
version of the Snack Data and further enhance it with additional features and
functionalities. The project provides a solid foundation forunderstanding Data
development concepts and Python programming techniques.

Source Code:

from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder \
.appName("BigDataAnalysis") \
.getOrCreate()

# Load the big data file into a DataFrame

data = spark.read.csv("path/to/bigdata.csv", header=True, inferSchema=True)

# Perform data analysis operations

# Example 1: Count the number of rows in the DataFrame
row_count = data.count()
print("Number of rows:", row_count)

# Example 2: Perform aggregations

agg_result = data.groupBy("column_name").agg({"numeric_column": "sum"})
agg_result.show()

# Example 3: Apply filters

filtered_data = data.filter(data["column_name"] > 100)
filtered_data.show()
5

# Example 4: Perform joins

joined_data = data.join(another_data, data["common_column"] == another_data["common_column"],
"inner")
joined_data.show()

# Example 5: Perform machine learning tasks (e.g., clustering, classification)

from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import VectorAssembler

# Prepare features for clustering

assembler = VectorAssembler(inputCols=["feature1", "feature2"], outputCol="features")
features_data = assembler.transform(data)

# Apply KMeans clustering

kmeans = KMeans(k=2, seed=0)
model = kmeans.fit(features_data)

# Get cluster predictions

predictions = model.transform(features_data)
predictions.show()

# Stop the SparkSession

spark.stop()
Note that this code assumes you have a running Spark cluster and that you have PySpark installed.
Additionally, you may need to modify the code based on your specific data and analysis requirements.

Loan Approval Predictor Using Data Science and Machine Learning Project
100% (1)
Loan Approval Predictor Using Data Science and Machine Learning Project
66 pages
DataGrokr Technical Assignment - Data Engineering - Internshala
No ratings yet
DataGrokr Technical Assignment - Data Engineering - Internshala
5 pages
Sample Phase 2 Document
No ratings yet
Sample Phase 2 Document
7 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
Data Analysis PHASE
No ratings yet
Data Analysis PHASE
14 pages
Int 421
No ratings yet
Int 421
2 pages
Screenshot 2023-12-27 at 7.05.37 PM
No ratings yet
Screenshot 2023-12-27 at 7.05.37 PM
23 pages
Bigdataspark Manual (MR-22)
No ratings yet
Bigdataspark Manual (MR-22)
106 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Kavin
No ratings yet
Kavin
13 pages
Unit II - Data Science
No ratings yet
Unit II - Data Science
113 pages
Athul Dev - Spark With Python (2020) - Libgen - Li
No ratings yet
Athul Dev - Spark With Python (2020) - Libgen - Li
153 pages
3 Must-Have Projects For Your Data Science Portfolio - by Aakash N S - Jovian - Jan, 2021 - Medium
No ratings yet
3 Must-Have Projects For Your Data Science Portfolio - by Aakash N S - Jovian - Jan, 2021 - Medium
1 page
7 Practicals With Python Practice With Data Science Cookbook
No ratings yet
7 Practicals With Python Practice With Data Science Cookbook
4 pages
Python
No ratings yet
Python
170 pages
Python Course Outline
No ratings yet
Python Course Outline
24 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Learninng Plan
No ratings yet
Learninng Plan
6 pages
Big Data Analysis
No ratings yet
Big Data Analysis
33 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Data Science
No ratings yet
Data Science
42 pages
Hemanth SDP
No ratings yet
Hemanth SDP
13 pages
Big - Data Lab Manual
No ratings yet
Big - Data Lab Manual
65 pages
1
No ratings yet
1
7 pages
Documentation Sample
No ratings yet
Documentation Sample
37 pages
ITECH2302 MainAssessment Report
No ratings yet
ITECH2302 MainAssessment Report
8 pages
Wa0005.
No ratings yet
Wa0005.
29 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
4 pages
Q.1 Explain Process of Working With Data From Files in Data Science
No ratings yet
Q.1 Explain Process of Working With Data From Files in Data Science
10 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Information Visualization Technologies
No ratings yet
Information Visualization Technologies
15 pages
Q.1 Explain Process of Working With Data From Files in Data Science
No ratings yet
Q.1 Explain Process of Working With Data From Files in Data Science
20 pages
Primitives
100% (1)
Primitives
3 pages
Slide 10 PySpark - SQL
No ratings yet
Slide 10 PySpark - SQL
131 pages
Big Data in Python
No ratings yet
Big Data in Python
10 pages
Phase 2
No ratings yet
Phase 2
14 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Datascience
No ratings yet
Datascience
26 pages
Detailed Python Data Analysis Big Data Tools
No ratings yet
Detailed Python Data Analysis Big Data Tools
9 pages
Self Intoduction 1 Project
No ratings yet
Self Intoduction 1 Project
11 pages
PCAC2009
No ratings yet
PCAC2009
3 pages
Python Data Analysis Big Data Tools
No ratings yet
Python Data Analysis Big Data Tools
7 pages
Abhishek BDA File
No ratings yet
Abhishek BDA File
23 pages
Python For Data Exploration
No ratings yet
Python For Data Exploration
28 pages
PM Shri Kendriya Vidyalaya Pattom Shift Ii: Movie Data Analysis
No ratings yet
PM Shri Kendriya Vidyalaya Pattom Shift Ii: Movie Data Analysis
35 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
Python for Data Analysis
No ratings yet
Python for Data Analysis
15 pages
Multivariate Statistical Method
No ratings yet
Multivariate Statistical Method
85 pages
Data Analysis Using Python2
No ratings yet
Data Analysis Using Python2
27 pages
Vamshi ml-1,2
No ratings yet
Vamshi ml-1,2
25 pages
Cse413 201-15-3452 Lab-Report 02
No ratings yet
Cse413 201-15-3452 Lab-Report 02
6 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
Data Analysis Visualization Full Project
No ratings yet
Data Analysis Visualization Full Project
19 pages
Research Proposal
No ratings yet
Research Proposal
6 pages
DS Final
No ratings yet
DS Final
46 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
8 pages
F# For Machine Learning Essentials - Sample Chapter
No ratings yet
F# For Machine Learning Essentials - Sample Chapter
29 pages
Python BigData Alternative Assignment
No ratings yet
Python BigData Alternative Assignment
5 pages
Project Based Experiential Learning Python For Datascience: Course Objective
No ratings yet
Project Based Experiential Learning Python For Datascience: Course Objective
2 pages
Lab Manual - 18CSL76 - 7th Sem
100% (5)
Lab Manual - 18CSL76 - 7th Sem
62 pages
Data Analyst_Data Engineer
No ratings yet
Data Analyst_Data Engineer
7 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
71 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
6 pages
KM Notes Unit-4
No ratings yet
KM Notes Unit-4
14 pages
Risk Assessment in Supply Chains: A State-Of-The-Art Review of Methodologies and Their Applications
No ratings yet
Risk Assessment in Supply Chains: A State-Of-The-Art Review of Methodologies and Their Applications
43 pages
Zhang Et Al. 2023
No ratings yet
Zhang Et Al. 2023
23 pages
Marketing Research
No ratings yet
Marketing Research
29 pages
Bildiri INISTA2011-AnomalyDetectioninTemperatureDataUsingDBSCANAlgorithm
No ratings yet
Bildiri INISTA2011-AnomalyDetectioninTemperatureDataUsingDBSCANAlgorithm
7 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
Big Data
No ratings yet
Big Data
2 pages
Improved Statistical Test
87% (172)
Improved Statistical Test
20 pages
Pa 5 Unit
No ratings yet
Pa 5 Unit
35 pages
9.log. Chapter 9 - Full
No ratings yet
9.log. Chapter 9 - Full
77 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Adaptive Resonance Theory (ART) : An Introduction by L.G. Heins & D.R. Tauritz May/June 1995
No ratings yet
Adaptive Resonance Theory (ART) : An Introduction by L.G. Heins & D.R. Tauritz May/June 1995
15 pages
Analyzing Passing Pattern in Football Using Big Data: Manpreet Singh
No ratings yet
Analyzing Passing Pattern in Football Using Big Data: Manpreet Singh
30 pages
PDF
No ratings yet
PDF
7 pages
Sample Size
No ratings yet
Sample Size
51 pages
K-Means With Spark & Hadoop - Big Data Analytics
No ratings yet
K-Means With Spark & Hadoop - Big Data Analytics
5 pages
Functional Data Analysis With R
100% (1)
Functional Data Analysis With R
338 pages
Norgaard Furstrand Klokker Etal Thee-Healthliteracyframework 2015
No ratings yet
Norgaard Furstrand Klokker Etal Thee-Healthliteracyframework 2015
21 pages
Data Analytics Chennai
No ratings yet
Data Analytics Chennai
20 pages
A Two Step Clustering Method For Mixed Categorical and Numerical Data
No ratings yet
A Two Step Clustering Method For Mixed Categorical and Numerical Data
9 pages
UNIT-2 Data Preprocessing
No ratings yet
UNIT-2 Data Preprocessing
51 pages
MSC CS Mqp0708
No ratings yet
MSC CS Mqp0708
12 pages
Machine Learning - Trading
No ratings yet
Machine Learning - Trading
3 pages
Clustering With R
No ratings yet
Clustering With R
4 pages

Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report

Uploaded by

Tushar Verma 21scse1310012 Data Analysis Using Big Data Tools 21scse1310012 Report

Uploaded by

Bachelor of Technology in Computer Science

Under the Guidance of Dr. K Suresh Sir

Big data analytics describes the process of uncovering trends,

Python programming language

Integrated Development Environment (IDE) such as PyCharmor

a. Setting up the Environment:

Import the necessary libraries, including PyData.

Set up the Data window dimensions, title, and otherconfigurations.

Map the arrow keys or WASD keys to specific movementssuch as up,

Implement the Big Data's initial position, size, and movementlogic.

Implement collision detection logic to check if the Big Datacollides with

from pyspark.sql import SparkSession

# Load the big data file into a DataFrame

# Perform data analysis operations

# Example 2: Perform aggregations

# Example 3: Apply filters

# Example 4: Perform joins

# Example 5: Perform machine learning tasks (e.g., clustering, classification)

# Prepare features for clustering

# Apply KMeans clustering

# Get cluster predictions

# Stop the SparkSession

You might also like