0% found this document useful (0 votes)

217 views17 pages

Business Analytics & Text Mining Modeling Using Python: Dr. Gaurav Dixit

This document provides an introduction to modeling using Python for business analytics and text mining. It discusses prediction and evaluation techniques for text mining, focusing on topics like topic assignment. It introduces Python as a suitable platform for data science and analytics due to libraries like NumPy, pandas, and matplotlib. The course will use Python and Jupyter Notebook for text mining tasks like predictive modeling, data preparation, retrieval, clustering, and information extraction.

Uploaded by

Ramu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

217 views17 pages

Business Analytics & Text Mining Modeling Using Python: Dr. Gaurav Dixit

Uploaded by

Ramu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Business Analytics & Text Mining

Modeling Using Python

INTRODUCTION
Dr. GAURAV DIXIT
DEPARTMENT OF MANAGEMENT STUDIES

1
INTRODUCTION

• Prediction and Evaluation

– Text mining modeling process is similar to data mining modeling
process
• Process is about building models based on prior cases (from training partition)
• Then the built model is used to predict the unseen cases (from test partition)
– Evaluation of the model success is
• Based on its performance on the test partition which is not part of the model
building process
– This mechanism works well for most of the text mining scenarios
• However, there might be few special scenarios

2
INTRODUCTION

• Prediction and Evaluation

– Example: Topic assignment
• Assigning topics to news stories, such as financial or sports stories
• However, news stories might change over time
– News stories for test partition should be selected taking into account this sensitivity towards dates of
publication
» Since model training process typically won’t account for changes over time

– Measurement of error
• Typically, classical measures of accuracy work well if all errors are to be evaluated
equally
• However, as in topic assignment problem, not all errors will be evaluated equally
– Measures of accuracy such as “recall” and “precision” are especially important in such scenarios

3
INTRODUCTION

• Prediction and Evaluation

– Other tasks like clustering and extraction are
• Exploratory in nature
• Performed using unsupervised methods
• Evaluation is not as objective as it is for predication and classification tasks

4
INTRODUCTION

• Further Comments on Text Mining

– Just like data mining techniques, borrow heavily from statistical
approaches
– Selection of learning methods depends on
• Data preparation
• Experience with text and data science methods gives us direction
– Focus of this course is on prediction aspects

5
INTRODUCTION

• Python as a Data Science Platform

– A general-purpose programming language
– One of the most popular interpreted programming languages
– Python is currently among the fastest-growing programming languages
in the world
• Ease of learning
• Data science and artificial intelligence (AI)
• Large and active developer community

6
INTRODUCTION

• Python as a Data Science Platform

– A suitable language
• Not only for doing research and prototyping, and testing new ideas
• But also for building the production systems
• An advantage over SAS & R where porting for larger production system might be
required
– Expected to overtake R to become most preferred platform for data
science

7
INTRODUCTION

• Python as a Data Science Platform

– Jupyter Notebook will be used for Python programming required in
this course
– Jupyter Notebook
• An open-source web platform
• To create and share documents that contain live code, equations, visualizations and
narrative text
• Used primarily for:
– Data cleaning and transformation, Numerical simulation, Statistical modeling, Data visualization,
Machine learning, and much more

8
INTRODUCTION

• Python
– This course focuses on using
Python programming language and
Its data-oriented library ecosystem
for analytics
– Suitable for application development (Higher productivity language)
• Due to it being an interpreted programming language
• Run substantially slower in comparison to compiled language like Java or C++

9
INTRODUCTION

• Python
– Not suitable for highly concurrent, multithreaded applications,
particularly applications with many CPU-bound threads
• Due to global interpreter lock (GIL) mechanism
– Prevents the interpreter from executing more than one Python instruction at a time

• Python data ecosystem

– Important library packages
• NumPy, pandas, and matplotlib

10
INTRODUCTION

• NumPy
– Short for Numerical Python
– For numerical computing in Python
– Contains
• Arrays for storing data (used as primary data structure), functions for manipulating
data

• pandas
– Name derived from panel data, an econometrics term
– For working with tabular or structured data

11
INTRODUCTION

• pandas
– Contains
• DataFrame
– A tabular, column-oriented data structure with both row and column labels
• Series
– A one-dimensional labeled array object
• Functionality to reshape, slice and dice, perform aggregations, and select subsets of
data

• matplotlib
– For producing plots and other two dimensional data visualizations

12
INTRODUCTION

• Python
– Other key library packages
• SciPy for scientific computing
• scikit-learn for machine learning (prediction-focused)
• statsmodels for classical statistics and econometrics (focused on statistical
inference)

13
INTRODUCTION

• Python: Other considerations

– Integrated Development Environments (IDEs) and Text Editors
• Spyder (free), an IDE currently shipped with Anaconda
– Similar to RStudio that was used in previous courses

– In this course, we shall be using Python 3.7 or later versions

14
INTRODUCTION

• Course Roadmap
– Module I: General Overview of Text Mining
– Module II: Python for Analytics
– Module III: Data Preparation
– Module IV: Predictive Models for Text
– Module V: Retrieval and Clustering of Documents
– Module VI: Information Extraction
– Module VII: Conclusion

15
Key References

• Fundamentals of Predictive Text Mining

– By Sholom M. Weiss, Nitin Indurkhya, & Tong Zhang (2015)
• Python for Data Analysis: Data Wrangling with Pandas,
NumPy, and Ipython
– By Wes McKinney (2017)

16
Thanks…

Fifth Sun, A New History of The Aztecs - Camilla Townsend
80% (5)
Fifth Sun, A New History of The Aztecs - Camilla Townsend
195 pages
Mooc File On Introduce To Machine Learning
No ratings yet
Mooc File On Introduce To Machine Learning
13 pages
6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
List of Computing and IT Abbreviations
No ratings yet
List of Computing and IT Abbreviations
33 pages
Milieu Communication Training For Late Talkers
No ratings yet
Milieu Communication Training For Late Talkers
7 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Emerging Technologies and Business Innovation-II PDF
No ratings yet
Emerging Technologies and Business Innovation-II PDF
4 pages
Dbms Lab Manual
No ratings yet
Dbms Lab Manual
40 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
2 pages
Iot Hardware: Raspberry Pi 3 Model B
No ratings yet
Iot Hardware: Raspberry Pi 3 Model B
39 pages
UNIT - 5 3D Object Representation
No ratings yet
UNIT - 5 3D Object Representation
59 pages
Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling
No ratings yet
Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling
18 pages
Chapter 3 - Supervised Learning - Neural Network Final
No ratings yet
Chapter 3 - Supervised Learning - Neural Network Final
103 pages
Week 8-Association Rules Part 1
No ratings yet
Week 8-Association Rules Part 1
31 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
1 page
Iot Systems - Logical Design Using Python: Bahga & Madisetti, © 2015
No ratings yet
Iot Systems - Logical Design Using Python: Bahga & Madisetti, © 2015
31 pages
Excel - Data - Analysis - 03 - Useful Books - TutorialsPoint
No ratings yet
Excel - Data - Analysis - 03 - Useful Books - TutorialsPoint
1 page
Tutorial On "R" Programming Language
No ratings yet
Tutorial On "R" Programming Language
25 pages
Unit I Predictive Analytics
No ratings yet
Unit I Predictive Analytics
39 pages
AnalytixLabs - Sports Equipment Case Study
No ratings yet
AnalytixLabs - Sports Equipment Case Study
2 pages
Dav Institute Of, Meangement Sahil
No ratings yet
Dav Institute Of, Meangement Sahil
61 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
Bda Unit 5
No ratings yet
Bda Unit 5
30 pages
Lesson 6 Virtualization
No ratings yet
Lesson 6 Virtualization
5 pages
Data-Mining-Lab-Manual Cs 703b
No ratings yet
Data-Mining-Lab-Manual Cs 703b
41 pages
Chi Merge
No ratings yet
Chi Merge
5 pages
Pythonic Data Cleaning With Numpy and Pandas
No ratings yet
Pythonic Data Cleaning With Numpy and Pandas
11 pages
ALX Data Science Program Description
No ratings yet
ALX Data Science Program Description
12 pages
PPT1
No ratings yet
PPT1
93 pages
Nikhil MOOC Report
No ratings yet
Nikhil MOOC Report
16 pages
SE 7204 BIG Data Analysis Unit I Final
No ratings yet
SE 7204 BIG Data Analysis Unit I Final
66 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
AIML Feb, March Scheme 2023
No ratings yet
AIML Feb, March Scheme 2023
25 pages
Basics of Machine Learning
No ratings yet
Basics of Machine Learning
20 pages
Data Analytics-Lab Manual
No ratings yet
Data Analytics-Lab Manual
19 pages
Virtualization and Five Step Process
No ratings yet
Virtualization and Five Step Process
19 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
Unit 2 - Knowledge Delivery
No ratings yet
Unit 2 - Knowledge Delivery
31 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
R Programming
No ratings yet
R Programming
11 pages
Vensim Manual
No ratings yet
Vensim Manual
9 pages
Data Warehousing and Data Mining (10cs755)
No ratings yet
Data Warehousing and Data Mining (10cs755)
142 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
2 pages
CCW331 Business Analytics Material Unit I Type2
No ratings yet
CCW331 Business Analytics Material Unit I Type2
43 pages
For Power BI Installation:: Get Data: To Get The Data From Different Sources Like CSV, Excel, Test, SQL, Access Etc..
No ratings yet
For Power BI Installation:: Get Data: To Get The Data From Different Sources Like CSV, Excel, Test, SQL, Access Etc..
11 pages
OOSE Lab Report
No ratings yet
OOSE Lab Report
30 pages
Cybersecurity Essentials Syllabus
No ratings yet
Cybersecurity Essentials Syllabus
2 pages
MC4411 Project Work - Format
No ratings yet
MC4411 Project Work - Format
65 pages
Mc4301 APR May 24 (Machine Learning)
No ratings yet
Mc4301 APR May 24 (Machine Learning)
3 pages
SCP Lab Manual
No ratings yet
SCP Lab Manual
29 pages
Review On NLP Paraphrase Detection Approaches
No ratings yet
Review On NLP Paraphrase Detection Approaches
4 pages
10 Support Vector Machine
No ratings yet
10 Support Vector Machine
130 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
G.L Bajaj Institute of Management and Research
No ratings yet
G.L Bajaj Institute of Management and Research
4 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
Spatial and Temporal Database
No ratings yet
Spatial and Temporal Database
44 pages
Database Management System
No ratings yet
Database Management System
32 pages
Predictive Analytics: Course Syllabus
No ratings yet
Predictive Analytics: Course Syllabus
8 pages
Pranav R Programming Lab File
No ratings yet
Pranav R Programming Lab File
41 pages
Evolution of Big Data
No ratings yet
Evolution of Big Data
21 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Bahasa Inggris V (English For Nursing) : Lecturer: Deni Abdillah. M, M. PD
No ratings yet
Bahasa Inggris V (English For Nursing) : Lecturer: Deni Abdillah. M, M. PD
8 pages
American English Pronunciation Lesson - The - S, - Es, - 'S Ending
No ratings yet
American English Pronunciation Lesson - The - S, - Es, - 'S Ending
6 pages
Class-V Monthly Planner Dec 2024
No ratings yet
Class-V Monthly Planner Dec 2024
2 pages
Women's Language Features Found in Same-Sex and Cross-Sex Conversations in He's Just Not That Into You Movie
No ratings yet
Women's Language Features Found in Same-Sex and Cross-Sex Conversations in He's Just Not That Into You Movie
10 pages
Raghu Resume
No ratings yet
Raghu Resume
3 pages
Lanaguage and Its ROLE in Human Life
No ratings yet
Lanaguage and Its ROLE in Human Life
12 pages
SoW Going Green
No ratings yet
SoW Going Green
6 pages
Word Concordance of The Tanakh or The Hebrew Bible Hebrew Old Testament 1st Edition Muhammad Wolfgang G A Schmidt PDF Download
No ratings yet
Word Concordance of The Tanakh or The Hebrew Bible Hebrew Old Testament 1st Edition Muhammad Wolfgang G A Schmidt PDF Download
90 pages
Edexcel Lit Poetry Tute 12 - The Tyger
No ratings yet
Edexcel Lit Poetry Tute 12 - The Tyger
21 pages
Bahasa Inggris BS KLS VIII 5
No ratings yet
Bahasa Inggris BS KLS VIII 5
65 pages
Termwise Syllabus Class - XII Computer Science (Old) : Language-C++, Code-283
No ratings yet
Termwise Syllabus Class - XII Computer Science (Old) : Language-C++, Code-283
5 pages
6 Traits of Writing PP
No ratings yet
6 Traits of Writing PP
17 pages
Jazz It Up - Grammar Games & Activities
No ratings yet
Jazz It Up - Grammar Games & Activities
10 pages
Year 6 Sentence Construction Exercise - Pemulihan 2014
No ratings yet
Year 6 Sentence Construction Exercise - Pemulihan 2014
13 pages
Writing The Perfect Email For Cambridge English
100% (1)
Writing The Perfect Email For Cambridge English
7 pages
RU CARRIER 69NT40-541-301-314-328 SPLST
No ratings yet
RU CARRIER 69NT40-541-301-314-328 SPLST
93 pages
El 101 Reflection Paper
No ratings yet
El 101 Reflection Paper
2 pages
Form Penilaian Lomba DEKLAMASI PUISI
No ratings yet
Form Penilaian Lomba DEKLAMASI PUISI
2 pages
Final Exam Practice Answers
No ratings yet
Final Exam Practice Answers
15 pages
Esol Lesson Plan Template
No ratings yet
Esol Lesson Plan Template
2 pages
Spravochnik Riyeltora
No ratings yet
Spravochnik Riyeltora
112 pages
Working With Words - Budding Scientist
No ratings yet
Working With Words - Budding Scientist
2 pages
Rubrics A2 English Language Center CI-UTC
No ratings yet
Rubrics A2 English Language Center CI-UTC
9 pages
Suffix Meaning Examples Part of Speech
No ratings yet
Suffix Meaning Examples Part of Speech
4 pages
Assembler New
No ratings yet
Assembler New
24 pages
Adjectives and Adverbs
No ratings yet
Adjectives and Adverbs
2 pages
Notes On Academic Writing
No ratings yet
Notes On Academic Writing
4 pages

Business Analytics & Text Mining Modeling Using Python: Dr. Gaurav Dixit

Uploaded by

Business Analytics & Text Mining Modeling Using Python: Dr. Gaurav Dixit

Uploaded by

Business Analytics & Text Mining

Modeling Using Python

• Prediction and Evaluation

• Prediction and Evaluation

• Prediction and Evaluation

• Further Comments on Text Mining

• Python as a Data Science Platform

• Python as a Data Science Platform

• Python as a Data Science Platform

• Python data ecosystem

• Python: Other considerations

– In this course, we shall be using Python 3.7 or later versions

• Fundamentals of Predictive Text Mining

You might also like