0% found this document useful (0 votes)

4 views9 pages

Data Exploration

The document discusses data exploration and preparation for training a collaborative filtering recommender system. It describes loading data into Spark, exploratory analysis including user and game statistics, feature engineering, hyperparameter selection and model training with MLflow experiment tracking, and evaluation of results.

Uploaded by

Anthony Ngatia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views9 pages

Data Exploration

Uploaded by

Anthony Ngatia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

1

Data Exploration

Your Name

Department of ABC, University of

ABC 101: Course Name

Professor (or Dr.) Firstname Lastname

Date
2

1. Description of any set up required completing the task.

To effectively complete the task, the following setup steps were undertaken:

1. First, ensure you have the right computing environment with Spark and MLlib already

installed. You can do this by either creating a Spark cluster or setting up a local Spark

(Apache Spark, 2019). Also, make sure that the dataset "steam-200k.csv" is located

where Spark can easily access it.

2. Then, load the dataset into a Spark Data Frame by employing Spark's Data Frame APIs to

effectively read the CSV file and its contents. Alongside this process, consider using

other visualization libraries like Matplotlib or Seaborn for exploratory analysis— install

these additional libraries if not yet installed and use them as needed.

3. The data was subsequently finished to end the preparatory phase of training the

collaborative filtering recommender system. Preprocessing steps such as handling

missing values, encoding categorical variables, and enhancing numerical features are also

performed to ensure that the data are in the appropriate format for training

samples(Karrar, 2022).

2.Loading data into Spark Data Frame and any exploratory analysis or visualisation

carried out prior to training.

Data loading and cleaning

Python code

import pandas as pd

# Assuming your data is stored in a CSV file named 'user_game_data.csv'

data = pd.read_csv("user_game_data.csv")

# Print a sample of the DataFrame after loading (before cleaning)

print("Sample Data (Before Cleaning):")

print(data.head())

This code reads the CSV data file (user_game_data.csv) into a pandas data Frame and prints the

first few rows (head()) to show the initial data format.

Output

user_id game_name action playtime

0 151603712 The Elder Scrolls V Skyrim purchase 1.0

1 151603712 The Elder Scrolls V Skyrim play 273.0

2 151603712 Fallout 4 purchase 1.0

3 151603712 Fallout 4 play 87.0

4 151603712 Spore purchase 1.0

5 151603712 Spore play 14.9

6 151603712 Fallout New Vegas purchase 1.0

7 151603712 Fallout New Vegas play 12.1

8 151603712 Left 4 Dead 2 purchase 1.0

9 151603712 Left 4 Dead 2 play 8.9

10 151603712 HuniePop purchase 1.0

11 151603712 HuniePop play 8.5

12 151603712 Path of Exile purchase 1.0

13 151603712 Path of Exile play 8.1

14 151603712 Poly Bridge purchase 1.0

15 151603712 Poly Bridge play 7.5

16 151603712 Left 4 Dead purchase 1.0

17 151603712 Left 4 Dead play 3.3

18 151603712 Team Fortress 2 purchase 1.0

19 151603712 Team Fortress 2 play 2.8

20 151603712 Tomb Raider purchase 1.0

This output displays the first 20 rows (head(20)) of the data Frame. As evident, user IDs, game

names, actions ("purchase" or "play") and playtime values (which might still contain non-

numeric entries). This gives a better idea of how the data is structured before any cleaning is

applied.

User Analysis:

Code that incorporates user analysis calculations, limited to a sample of 200000 users

import pandas as pd

# Assuming your data is stored in a CSV file named 'user_game_data.csv'

data = pd.read_csv("user_game_data.csv")

# Sample the data (if data size is larger than 200000)

if len(data) > 200000:

data = data.sample(200000)

# User Analysis

total_users = len(data["user_id"].unique())

avg_purchases_per_user = data[data["action"] == "purchase"].groupby("user_id").size().mean()

# Print user analysis results

print("Total Users (Sample 200000):", total_users)

print("Average Purchases per User (Sample 200000):", avg_purchases_per_user)

This code performs the following steps:

Data Loading: Reads the CSV data file (user_game_data.csv) into a pandas DataFrame.

Sample Selection (if applicable):Checks the data size using len(data).

If the data size is greater than 200000, it randomly samples 200000 entries using

DataFrame.sample.

Output: Prints the total number of users (limited to the sample of 200000 if applicable) and the

average number of purchases per user.

Game Analysis

The aim is to identify the 10 most popular games depending on the number of purchases. This

can reveal user preferences and highlight certain brands or franchises that resonate with players.

Top 10 most purchased games

Purchase Count
Dota 2

The Elder Scrolls V: Skyrim

Stardew Valley

PlayerUnknown's Battlegrounds (PUBG)

Terraria

Call of Duty: Modern Warfare (2019)

League of Legends

The Witcher 3: Wild Hunt

Minecraft

Grand Theft Auto V

0 100 200 300 400 500 600

3. Data preparation and pre-processing carried out prior to training the model
6

Prior to initiating model training, there was a substantial amount of work done in terms of data

preparation and pre-processing so as to guarantee the appropriateness of the dataset for

developing a collaborative filtering recommender system. Several important procedures were

involved in this regard. For starters, the data was meticulously cleaned so as to address any

missing values, inconsistencies or irregularities within it. The approach taken with missing

values was either imputation or removal based on the specific context and implications towards

the overall integrity of the dataset.

Feature engineers are then designed to extract relevant information and create new features that

can improve the model's performance. This involved converting different classes into numerical

equations, measuring numerical signals in a single line, and mapping text data into a format

suitable for machine learning algorithms (Rosencrance, 2021). Additionally, the data set is

divided into training and testing to evaluate the performance of the model. Data allocation was

random to ensure a representative sample while still disseminating data to the general population.

Exploratory data analysis concepts are also used to inform feature selection and engineering

decisions while ensuring that only relevant and meaningful features are included.

4. Selection of hyper parameters and model training.

Key stages in the development of the collaborative filtering recommender system include

hyperparameter selection, model training, and evaluation in MLflow Experiment, and tracking

(Al-Ghamdi et al., 2021). First is the detailed exploration of hyperparameters, through which

important parameters that affect the performance of the model could be identified. Different

techniques for hyperparameter tuning are grid or random search, looking for optimal

combinations efficiently.
7

After selecting the hyper parameters, the model was trained on the training dataset using the

selected hyper parameter values. During the training, MLflow experiment tracking was applied

to track and log the model’s performance metrics, hyper parameters, as well as training logs.

This enabled a thorough experiment execution, and the model’s learning dynamics across a

varying parameter space were captured.

After model training, evaluation parameters were calculated using test statistics to evaluate the

model's performance and overall skills. Common evaluation criteria for interactive filtering

include empirical measurements such as mean error (MSE), root mean square error (RMSE), or

parameter-based measurements such as precision and recall.

The use of MLflow experiment tracking was instrumental in the orchestration and record-

keeping of our experimentation journey— allowing simple juxtaposition between varied model

iterations and hyper parameter configurations. This thereby fosters a cognizant decision process

on model choice plus what next optimization steps to take. An essential element that played a

crucial role: MLflow experiment tracking.

5. Discussion of the result

The analysis of game purchase data reveals several interesting trends:

1. Genre Variety: Top purchased games spread across genres, which points to good

coverage in the spectrum of user interests: action-adventure, RPG, strategy, simulations,

and FPS games; in short, it shows variety in the player base.

2. Established franchises thrive: Grand Theft Auto, Call of Duty, The Elder Scrolls—strong

positions of well-known franchises show how they could have continued to be popular

for a long time and could have loyal player bases.

3. Competitive Online Games: That there are mainly free-to-play MOBA games on this list,

such as League of Legends and Dota 2, demonstrates that competitive online gaming is a

major hit, and, given the game’s massive price due to in-app purchasing, it is also

extremely lucrative.

4. Impact of Recent Releases: Call of Duty: Modern Warfare appears on the list as well

despite being a fairly recent release, it could be either due to purchase boosts shortly after

launch or ongoing purchases from play.

References

Al-Ghamdi, M., Elazhary, H., & Mojahed, A. (2021). Evaluation of Collaborative Filtering for

Recommender Systems. International Journal of Advanced Computer Science and

Applications, 12(3). https://doi.org/10.14569/ijacsa.2021.0120367

Apache Spark. (2019). MLlib | Apache Spark. Apache.org. https://spark.apache.org/mllib/

Karrar, A. E. (2022). The Effect of Using Data Pre-Processing by Imputations in Handling

Missing Values. Indonesian Journal of Electrical Engineering and Informatics (IJEEI),

10(2). https://doi.org/10.52549/ijeei.v10i2.3730

Rosencrance, L. (2021, January 4). What is Feature Engineering for Machine Learning?

SearchDataManagement.

https://www.techtarget.com/searchdatamanagement/definition/feature-engineering

Foundation of Data Science
100% (1)
Foundation of Data Science
201 pages
PySpark Comprehensive Notes
No ratings yet
PySpark Comprehensive Notes
59 pages
Dataset 1
No ratings yet
Dataset 1
4,445 pages
Mastering Apache Spark PDF
50% (2)
Mastering Apache Spark PDF
1,352 pages
Major Report20202
No ratings yet
Major Report20202
59 pages
Advanced Analytics With Pyspark 1st Edition Akash Tandon
No ratings yet
Advanced Analytics With Pyspark 1st Edition Akash Tandon
50 pages
S. Haines - Modern Data Engineering With Apache Spark - A Hands-On Guide For Building Mission-Critical Streaming Applications (2022) - Libgen - Li
50% (4)
S. Haines - Modern Data Engineering With Apache Spark - A Hands-On Guide For Building Mission-Critical Streaming Applications (2022) - Libgen - Li
592 pages
Unit-5 Spark
No ratings yet
Unit-5 Spark
24 pages
Databricks Certified Machine Learning Professional Exam Guide
No ratings yet
Databricks Certified Machine Learning Professional Exam Guide
9 pages
Get Started With Databricks For Machine Learning
No ratings yet
Get Started With Databricks For Machine Learning
85 pages
IT Project
No ratings yet
IT Project
43 pages
Research Paper
No ratings yet
Research Paper
34 pages
Source Mca
No ratings yet
Source Mca
42 pages
Advanced Certificate Program in Data Science and AI Curriculum v1.0
No ratings yet
Advanced Certificate Program in Data Science and AI Curriculum v1.0
55 pages
Ipp2-Kv GTC Analysis of Gaming
No ratings yet
Ipp2-Kv GTC Analysis of Gaming
24 pages
Course Catalog
No ratings yet
Course Catalog
57 pages
Video Games Sales
No ratings yet
Video Games Sales
19 pages
Importing Libraries: Import As Import As Import As Import As Import As Import
No ratings yet
Importing Libraries: Import As Import As Import As Import As Import As Import
13 pages
Machine Learning Operations MLOps Overview Definition and Architecture
No ratings yet
Machine Learning Operations MLOps Overview Definition and Architecture
14 pages
Daa Report
No ratings yet
Daa Report
13 pages
Multiplayer Game Making
No ratings yet
Multiplayer Game Making
14 pages
Surya, Aaron, Savanth Ip Project
No ratings yet
Surya, Aaron, Savanth Ip Project
25 pages
Optimization of The Join Between Large Tables in T
No ratings yet
Optimization of The Join Between Large Tables in T
14 pages
Module 2.2
No ratings yet
Module 2.2
32 pages
Serverless Etl Aws Glue
No ratings yet
Serverless Etl Aws Glue
17 pages
Comparative Analysis On Popular Games Between 1980-2023
No ratings yet
Comparative Analysis On Popular Games Between 1980-2023
20 pages
Rascunho Entrega Final
No ratings yet
Rascunho Entrega Final
4 pages
Mca - Ai
No ratings yet
Mca - Ai
24 pages
R Game
No ratings yet
R Game
79 pages
Wellarchitected Analytics Lens
No ratings yet
Wellarchitected Analytics Lens
59 pages
Dsa and ML 10
No ratings yet
Dsa and ML 10
18 pages
Game Recommendation System
No ratings yet
Game Recommendation System
15 pages
Project Synopsis ON: Video Game Sales Prediction
No ratings yet
Project Synopsis ON: Video Game Sales Prediction
4 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Sigir 17
No ratings yet
Sigir 17
4 pages
Report en
No ratings yet
Report en
9 pages
Unit-5 DS
No ratings yet
Unit-5 DS
20 pages
Overview of Big Data: CS102 Winter 2019
No ratings yet
Overview of Big Data: CS102 Winter 2019
53 pages
Video Games Sales Analysis
No ratings yet
Video Games Sales Analysis
20 pages
Introduction To Hadoop & Spark
No ratings yet
Introduction To Hadoop & Spark
28 pages
Task
No ratings yet
Task
3 pages
Kredit Skoring Dan Big Data
No ratings yet
Kredit Skoring Dan Big Data
12 pages
Recommendation Engine Problem Statement
33% (3)
Recommendation Engine Problem Statement
3 pages
Azure DE Interview Que
100% (1)
Azure DE Interview Que
25 pages
Real-Time Prediction of Gamers Behavior Using Variable Order Markov and Big Data Technology: A Case of Study
No ratings yet
Real-Time Prediction of Gamers Behavior Using Variable Order Markov and Big Data Technology: A Case of Study
8 pages
Ranjan Burnwal: Software Engineer II
No ratings yet
Ranjan Burnwal: Software Engineer II
1 page
THARUN's Resume
No ratings yet
THARUN's Resume
1 page
Week 1 Technical Appendix Template
No ratings yet
Week 1 Technical Appendix Template
3 pages
Week 1 Technical Appendix Template
No ratings yet
Week 1 Technical Appendix Template
3 pages
Engl317 Proj4 Whitepaper
No ratings yet
Engl317 Proj4 Whitepaper
18 pages
10 Recommendation Engine Problem Statement
No ratings yet
10 Recommendation Engine Problem Statement
10 pages
Big Data Technologies
No ratings yet
Big Data Technologies
4 pages
Aseer
No ratings yet
Aseer
6 pages
Dataeng-Zoomcamp - 5 - Batch - Processing - MD at Main Ziritrion - Dataeng-Zoomcamp GitHub
No ratings yet
Dataeng-Zoomcamp - 5 - Batch - Processing - MD at Main Ziritrion - Dataeng-Zoomcamp GitHub
41 pages
Game Report Documentation
67% (3)
Game Report Documentation
14 pages
Sufyan Aziz Prabaswara KOMSTAT
No ratings yet
Sufyan Aziz Prabaswara KOMSTAT
9 pages
Apache Spark - Optimization Techniques
No ratings yet
Apache Spark - Optimization Techniques
7 pages
Data Exploration
No ratings yet
Data Exploration
4 pages
Big Data For Machine Learning - Syllabus
No ratings yet
Big Data For Machine Learning - Syllabus
2 pages
Apache Cassandra
No ratings yet
Apache Cassandra
3 pages
Data Science with .NET and Polyglot Notebooks: Programmer's guide to data science using ML.NET, OpenAI, and Semantic Kernel
From Everand
Data Science with .NET and Polyglot Notebooks: Programmer's guide to data science using ML.NET, OpenAI, and Semantic Kernel
Matt Eland
No ratings yet
Learn to Program with Small Basic: An Introduction to Programming with Games, Art, Science, and Math
From Everand
Learn to Program with Small Basic: An Introduction to Programming with Games, Art, Science, and Math
Majed Marji
No ratings yet
Storm Blueprints: Patterns for Distributed Realtime Computation
From Everand
Storm Blueprints: Patterns for Distributed Realtime Computation
P. Taylor Goetz
4/5 (1)
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Building a Product Master
From Everand
Building a Product Master
Edufdev
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Phaser III Game Design Workbook
From Everand
Phaser III Game Design Workbook
Stephen Gose
No ratings yet
Phaser III Game Prototyping
From Everand
Phaser III Game Prototyping
Stephen Gose
No ratings yet
Frank Kane's Taming Big Data with Apache Spark and Python
From Everand
Frank Kane's Taming Big Data with Apache Spark and Python
Frank Kane
No ratings yet
Machine Learning: Hands-On for Developers and Technical Professionals
From Everand
Machine Learning: Hands-On for Developers and Technical Professionals
Jason Bell
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Making Multiplayer Online Games: A Game Development Workbook for any Phaser JavaScript Gaming Framework.
From Everand
Making Multiplayer Online Games: A Game Development Workbook for any Phaser JavaScript Gaming Framework.
Stephen Gose
No ratings yet
SAS Statistics Data Analysis Certification Questions: Unofficial SAS Data analysis Certification and Interview Questions
From Everand
SAS Statistics Data Analysis Certification Questions: Unofficial SAS Data analysis Certification and Interview Questions
equitypress
4.5/5 (2)
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Mastering Cocos2d Game Development
From Everand
Mastering Cocos2d Game Development
Alex Ogorek
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Programming Concepts in Python
From Everand
Programming Concepts in Python
Robert Burns
No ratings yet
Computer Forensics JumpStart
From Everand
Computer Forensics JumpStart
Michael G. Solomon
3.5/5 (2)
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Blood PitTM In-Game Module
From Everand
Blood PitTM In-Game Module
Stephen Gose
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
From Everand
Unofficial TIBCO® Business Works™ Interview Questions, Answers, and Explanations: TIBCO Certification Review Questions
equitypress
3.5/5 (2)
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)

Data Exploration

Uploaded by

Data Exploration

Uploaded by

1

Department of ABC, University of

ABC 101: Course Name

Professor (or Dr.) Firstname Lastname

1. Description of any set up required completing the task.

where Spark can easily access it.

collaborative filtering recommender system. Preprocessing steps such as handling

carried out prior to training.

Data loading and cleaning

# Assuming your data is stored in a CSV file named 'user_game_data.csv'

# Print a sample of the DataFrame after loading (before cleaning)

print("Sample Data (Before Cleaning):")

first few rows (head()) to show the initial data format.

user_id game_name action playtime

0 151603712 The Elder Scrolls V Skyrim purchase 1.0

1 151603712 The Elder Scrolls V Skyrim play 273.0

2 151603712 Fallout 4 purchase 1.0

3 151603712 Fallout 4 play 87.0

4 151603712 Spore purchase 1.0

5 151603712 Spore play 14.9

6 151603712 Fallout New Vegas purchase 1.0

7 151603712 Fallout New Vegas play 12.1

8 151603712 Left 4 Dead 2 purchase 1.0

9 151603712 Left 4 Dead 2 play 8.9

10 151603712 HuniePop purchase 1.0

11 151603712 HuniePop play 8.5

12 151603712 Path of Exile purchase 1.0

13 151603712 Path of Exile play 8.1

14 151603712 Poly Bridge purchase 1.0

15 151603712 Poly Bridge play 7.5

16 151603712 Left 4 Dead purchase 1.0

17 151603712 Left 4 Dead play 3.3

18 151603712 Team Fortress 2 purchase 1.0

19 151603712 Team Fortress 2 play 2.8

20 151603712 Tomb Raider purchase 1.0

# Assuming your data is stored in a CSV file named 'user_game_data.csv'

# Sample the data (if data size is larger than 200000)

if len(data) > 200000:

avg_purchases_per_user = data[data["action"] == "purchase"].groupby("user_id").size().mean()

# Print user analysis results

print("Total Users (Sample 200000):", total_users)

print("Average Purchases per User (Sample 200000):", avg_purchases_per_user)

This code performs the following steps:

Sample Selection (if applicable):Checks the data size using len(data).

average number of purchases per user.

Top 10 most purchased games

The Elder Scrolls V: Skyrim

PlayerUnknown's Battlegrounds (PUBG)

Call of Duty: Modern Warfare (2019)

The Witcher 3: Wild Hunt

Grand Theft Auto V

preparation and pre-processing so as to guarantee the appropriateness of the dataset for

developing a collaborative filtering recommender system. Several important procedures were

the overall integrity of the dataset.

4. Selection of hyper parameters and model training.

varying parameter space were captured.

parameter-based measurements such as precision and recall.

crucial role: MLflow experiment tracking.

5. Discussion of the result

The analysis of game purchase data reveals several interesting trends:

coverage in the spectrum of user interests: action-adventure, RPG, strategy, simulations,

and FPS games; in short, it shows variety in the player base.

for a long time and could have loyal player bases.

launch or ongoing purchases from play.

Recommender Systems. International Journal of Advanced Computer Science and

Applications, 12(3). https://doi.org/10.14569/ijacsa.2021.0120367

Apache Spark. (2019). MLlib | Apache Spark. Apache.org. https://spark.apache.org/mllib/

Karrar, A. E. (2022). The Effect of Using Data Pre-Processing by Imputations in Handling

Missing Values. Indonesian Journal of Electrical Engineering and Informatics (IJEEI),

You might also like