0% found this document useful (0 votes)
2 views

CS 4407 Programming Assign. Unit 2

The document outlines the differences between traditional relational databases, analytical databases, and NoSQL databases, highlighting their structures, purposes, and examples. It also describes an analytics system for customer churn prediction in a telecom company, detailing the integration of a MySQL database, R for statistical modeling, and the WEKA API for model deployment. Overall, it illustrates the flow of data from operational databases to predictive modeling and real-time decision-making applications.

Uploaded by

Danial Naveed
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CS 4407 Programming Assign. Unit 2

The document outlines the differences between traditional relational databases, analytical databases, and NoSQL databases, highlighting their structures, purposes, and examples. It also describes an analytics system for customer churn prediction in a telecom company, detailing the integration of a MySQL database, R for statistical modeling, and the WEKA API for model deployment. Overall, it illustrates the flow of data from operational databases to predictive modeling and real-time decision-making applications.

Uploaded by

Danial Naveed
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

University of the People

CS 4407- Data Mining and Machine Learning

UNIT 2: Tools and Technologies for Data Mining and Machine Learning

Programming Assign. Unit 2

Mary Barker (Instructor)

12th February 2025


1. Comparing Databases

Traditional
Analytical Database
Feature Database NoSQL Database
(Data Warehouse)
(Relational)

Structured (rows Structured (often star Varies (document,


Data
and columns in schema or snowflake key-value, graph,
Structure
tables) schema) column-family)

Structured and semi- Unstructured, semi-


Primarily
Data Type structured, historical structured, and
structured data
data structured

Transaction Decision support, Flexible data storage,

Purpose processing (OLTP), business intelligence scalability, specific

data integrity (OLAP) use cases

Varies (e.g., JSON-


Query SQL (often
SQL like queries, specific
Language extended)
API calls)

Horizontal (designed
Vertical (scaling up Horizontal (scaling
Scalability for distributed
hardware) out by adding nodes)
systems)

Consistency Strong consistency Eventual consistency Varies (different

(ACID properties) (often relaxed for


performance) consistency models)

MySQL,
MongoDB,
PostgreSQL, Teradata, Snowflake,
Examples Cassandra, Redis,
Oracle, SQL Amazon Redshift
Neo4j
Server

Key Differences Summarized:

 Traditional databases are optimized for managing transactions and maintaining data

integrity. Think of them as the workhorses for applications where data needs to be

accurate and consistent right now, like banking systems or order processing (PingCAP,

2024).

 Analytical databases are designed for complex queries and analysis of large volumes of

historical data. They are used to understand trends, patterns, and insights to support

decision-making. Imagine a data warehouse used to analyze sales data over several years

(Jérémy, 2024).

 NoSQL databases offer flexibility and scalability for handling various data types and

large volumes of data. They are often used for applications with specific needs, like social

media platforms, real-time analytics, or IoT (Internet of Things) data (Real-World

NoSQL Database Use Cases: Examples and Use Cases for Developers | DataStax, 2025).

2. Connecting Databases, Statistical Packages, and APIs in an Analytics System

An example focusing on customer churn prediction in a telecom company:


 Database (Traditional - MySQL): The telecom company stores customer data in a

MySQL database. This includes demographics, service usage, billing information, and

customer service interaction logs. This is the operational data, constantly being updated.

 Statistical Package (R): R is used for statistical modeling and predictive analytics. The

data from the MySQL database is extracted and loaded into R. R is chosen because of its

rich set of statistical libraries and visualization capabilities, ideal for building a churn

prediction model (Priyadharshini, 2024).

 API (WEKA): WEKA, while often used as a standalone tool, can also be integrated via

its API. In this scenario, the churn prediction model developed in R is deployed through a

WEKA API. This allows other systems within the telecom company to access and use the

model. For example, a customer service application can use the API to get a churn risk

score for a customer in real-time (Getting Started With WEKA REST API | W E K A,

n.d).

How They Relate:

1. Data Extraction: Data from the operational MySQL database is extracted, often using

SQL queries, and transformed into a format suitable for R. This might involve cleaning

the data, handling missing values, and aggregating information (Ethan, 2023).

2. Model Building: R is used to build a statistical model that predicts customer churn. This

involves feature engineering, model selection (e.g., logistic regression, random forests),

and model evaluation (Peterka, 2025).

3. Model Deployment: The trained churn prediction model is deployed via the WEKA API.

This makes the model accessible to other applications.

4. Integration: The customer service application uses the WEKA API to send customer

data to the model and receive a churn risk score. This score can then be used to trigger

interventions, such as targeted promotions or proactive customer service outreach.


Overall Analytics System:

This example demonstrates a typical analytics system flow. Data originates in the operational

database. It's then extracted and used by a statistical package to build a predictive model. Finally,

the model is deployed via an API, making the insights available to other systems for real-time

decision-making. This entire process helps the telecom company to proactively identify at-risk

customers and take steps to reduce churn.

References

PingCAP. (2024, December 12). Vector Stores vs. Traditional Databases: A Detailed

Comparison. TiDB. https://www.pingcap.com/article/vector-stores-vs-traditional-databases-a-


detailed-comparison/#:~:text=Traditional%20databases%20are%20designed%20to,integrity%2C

%20consistency%2C%20and%20reliability.

Jérémy. (2024, July 8). Unlocking Insights: A Guide to Understanding Analytical Databases.

Toucan. https://www.toucantoco.com/en/blog/analytical-databases#:~:text=An%20analytical

%20database%20is%20a,for%20the%20purpose%20of%20analysis

Real-World NoSQL Database use cases: Examples and use cases for developers | DataStax.

(2025, January 31). DataStax. https://www.datastax.com/guides/nosql-use-cases

Priyadharshini. (2024, June 13). Battle of the Programming Languages: R vs Python.

Simplilearn.com. https://www.simplilearn.com/r-vs-python-battle-of-programming-languages-

article#:~:text=R%3A%20R%20has%20a%20rich,%2C%20data%20manipulation%2C%20and

%20visualization.

Getting started with WEKA REST API | W E K A. (n.d.-b). https://docs.weka.io/getting-started-

with-weka/getting-started-with-weka-rest-api

Ethan, E. (2023, May 15). Understanding MySQL and ETL: A Comprehensive Overview.

https://portable.io/learn/mysql-elt

Peterka, P. (2025, February 11). Analytical Modeling: A Guide to Data-Driven Decision making.

SixSigma.us. https://www.6sigma.us/six-sigma-in-focus/analytical-modeling/

You might also like