CS 4407 Programming Assign. Unit 2
CS 4407 Programming Assign. Unit 2
UNIT 2: Tools and Technologies for Data Mining and Machine Learning
Traditional
Analytical Database
Feature Database NoSQL Database
(Data Warehouse)
(Relational)
Horizontal (designed
Vertical (scaling up Horizontal (scaling
Scalability for distributed
hardware) out by adding nodes)
systems)
MySQL,
MongoDB,
PostgreSQL, Teradata, Snowflake,
Examples Cassandra, Redis,
Oracle, SQL Amazon Redshift
Neo4j
Server
Traditional databases are optimized for managing transactions and maintaining data
integrity. Think of them as the workhorses for applications where data needs to be
accurate and consistent right now, like banking systems or order processing (PingCAP,
2024).
Analytical databases are designed for complex queries and analysis of large volumes of
historical data. They are used to understand trends, patterns, and insights to support
decision-making. Imagine a data warehouse used to analyze sales data over several years
(Jérémy, 2024).
NoSQL databases offer flexibility and scalability for handling various data types and
large volumes of data. They are often used for applications with specific needs, like social
NoSQL Database Use Cases: Examples and Use Cases for Developers | DataStax, 2025).
MySQL database. This includes demographics, service usage, billing information, and
customer service interaction logs. This is the operational data, constantly being updated.
Statistical Package (R): R is used for statistical modeling and predictive analytics. The
data from the MySQL database is extracted and loaded into R. R is chosen because of its
rich set of statistical libraries and visualization capabilities, ideal for building a churn
API (WEKA): WEKA, while often used as a standalone tool, can also be integrated via
its API. In this scenario, the churn prediction model developed in R is deployed through a
WEKA API. This allows other systems within the telecom company to access and use the
model. For example, a customer service application can use the API to get a churn risk
score for a customer in real-time (Getting Started With WEKA REST API | W E K A,
n.d).
1. Data Extraction: Data from the operational MySQL database is extracted, often using
SQL queries, and transformed into a format suitable for R. This might involve cleaning
the data, handling missing values, and aggregating information (Ethan, 2023).
2. Model Building: R is used to build a statistical model that predicts customer churn. This
involves feature engineering, model selection (e.g., logistic regression, random forests),
3. Model Deployment: The trained churn prediction model is deployed via the WEKA API.
4. Integration: The customer service application uses the WEKA API to send customer
data to the model and receive a churn risk score. This score can then be used to trigger
This example demonstrates a typical analytics system flow. Data originates in the operational
database. It's then extracted and used by a statistical package to build a predictive model. Finally,
the model is deployed via an API, making the insights available to other systems for real-time
decision-making. This entire process helps the telecom company to proactively identify at-risk
References
PingCAP. (2024, December 12). Vector Stores vs. Traditional Databases: A Detailed
%20consistency%2C%20and%20reliability.
Jérémy. (2024, July 8). Unlocking Insights: A Guide to Understanding Analytical Databases.
Toucan. https://www.toucantoco.com/en/blog/analytical-databases#:~:text=An%20analytical
%20database%20is%20a,for%20the%20purpose%20of%20analysis
Real-World NoSQL Database use cases: Examples and use cases for developers | DataStax.
Simplilearn.com. https://www.simplilearn.com/r-vs-python-battle-of-programming-languages-
article#:~:text=R%3A%20R%20has%20a%20rich,%2C%20data%20manipulation%2C%20and
%20visualization.
with-weka/getting-started-with-weka-rest-api
Ethan, E. (2023, May 15). Understanding MySQL and ETL: A Comprehensive Overview.
https://portable.io/learn/mysql-elt
Peterka, P. (2025, February 11). Analytical Modeling: A Guide to Data-Driven Decision making.
SixSigma.us. https://www.6sigma.us/six-sigma-in-focus/analytical-modeling/