EJ1151898

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Information Systems Education Journal (ISEDJ) 15 (5)

ISSN: 1545-679X September 2017


__________________________________________________________________________________________________________________________

Emergence of Data Analytics in the Information


Systems Curriculum

Musa J. Jafar
[email protected]
Manhattan College
Riverdale, NY 10471 USA

Jeffry Babb
[email protected]

Amjda Abdullat
[email protected]

West Texas A&M University


Canyon, TX 79016, USA

Abstract

As a phenomenon of interest, impact, and import, there is little doubt that the pervasive expansion of
data is upon us as Information Systems educators. Concerns and topics such as Data Science, Data
Analytics, Machine Learning, Business Analytics, and Business Intelligence are now ubiquitous and often
situated as being the “next big thing.” Educators and practitioners who identify and resonate with
information systems, as a discipline, are watching these developments with interest. With data being
both input and output to so many concerns that intersect with the information systems discipline, several
themes emerge when considering what curriculum and pedagogy are appropriate. The role, position,
location, and shape of data science topics are considered. Curricular approaches are also discussed with
an eye to breadth and depth. Fundamental and existential questions are raised concerning the nature
of data science and what role the Information Systems discipline can play. We also discuss evidence
from cases. Case one involves a student business analytics competition and case two investigates how
information systems knowledge areas can appropriate data science as an integral component of many
competencies that exist solidly within the canon of Information Systems (IS) topics.

Keywords: IS Curriculum, Data Science, Data Analytics, Machine Learning, Business Analytics.

1. INTRODUCTION Musicant 2006, Jafar 2008, Asamoah 2015) to a


minor area of study, a co/dual-major, or even as
In the late 1990’s, the data mining discipline was a fully-independent degree program. Today, this
viewed then as a “single phase in a larger life degree program will typically be referred to as
cycle” of Knowledge Discovery in Databases Data Analytics, Business Intelligence or Business
(Collier et. al., 1998). The earliest we could trace Analytics. Although the content of the curriculum
back the offering of datamining courses is to Guo (or even the individual courses) is emergent, and
(1998) and Lopez (2001). Since then, data therefore as stream-lined as in the case with the
mining has evolved from appearing as elective more mature disciplines of finance, accounting,
course in an Information Systems (IS) Curriculum marketing, we do see the need for an extended
(Lenox 2002, Patel 2003, Goharian 2004, minor, a co/dual-major and/or a undergraduate

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 22
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

program in data analytics in the IS curriculum. In system or a system (such as what drives the
this paper we extend our consideration of the consumer experience when shopping on
data analytics subject by articulating the case and Amazon’s website) that is used to reveal patterns
the requirements for a Master’s degree in data constitutes data analytics. Figure 3 is an
analytics as an IS degree program. illustrative attempt to disambiguate the problem
space.
In the past 10 years, terms and concepts
The sum of these innovations in data
popularly known as Big Data, Data Science, Data
management and use are both prescient and
Analytics, Machine Learning, Business Analytics
compelling in a contemporary dialog on the
and Business Intelligence have become lexically
themes and content that define the IS discipline.
normalized in the discussion of unfolding horizons
As an inter-discipline, IS has commonly absorbed
that impact organizations, their use of
innovations over its history. Thus, it is quite
information technology, and their expected utility
normal for IS to develop an existential
of their information systems. In both the
conversation when new waves of innovation
academic and corporate worlds, these terms are
impact its shores. However, as a bridging
somewhat elusive as they mean different things
discipline, between organizations, people,
to different constituencies (O’Neil 2014).
information and computing technology, data is
We use the terms machine learning, and data foundational the discipline’s identity in an acute
science to highlight the Statistics-Mathematic- sense.
algorithmic and the Computer Science aspect of
the discipline where the theory is established and Normalizing the Discipline
the algorithms are coded. We use the terms Data Data is in the very bloodstream of an
Analytics, Business Analytics or Business organization. Every aspect of business,
Intelligence to emphasize the applications side of government, science, humanities, medicine, etc.
the discipline where algorithms are understood, has both a data and an analytics component.
the underlying computing software is Further having emerged out of the various
comprehended and utilized to solve business schools of business concerns, IS has always had
problems, reveal patterns and extract insights a central focus on data and its processing.
from the data. We use the term Big Data to Rhetorically, we can ponder “what do transaction
emphasize the volume, variety, velocity and processing and analytical processing have in
veracity of data and the need for fault-tolerant common and where do they bifurcate?” Arguably,
computing platforms that can manage large their intersection and union revolves around data.
amounts of unstructured data where daily tasks Transaction processing uses business models
need to be parallelized, distributed, load- (rules) to manipulate the data. Simply put. it is
balanced, processed and results are combined. It SQL-based data warehousing analytics. Analytical
is the layer of abstractions created to hide the processing on the other hand, relies on wider
infrastructure code and manage these tasks. It is ranges of data including transaction processing
the Map-Reduce model and its derivatives. This and digital sensors (social media, web, apps,
paper focuses on curricular issues as they relate government, etc.). Analytical processing uses
to data (business) analytics. We will use Data statistical models to sift through and extend the
Analytics to mean both Data Analytics and use of transactional data to produce insights and
Business Analytics. reveal patterns. We may even argue that
analytical processing would not have existed if it
Simon (2013) describes data analytics as “…the
was not for the maturity and openness of
combination of statistics, mathematics,
transaction processing systems.
programming, problem solving, capturing data in
ingenious ways the ability to look at things
Rush to Discipline
differently and the activity of cleansing, preparing
All of the business disciplines are exuberant over
and aligning data.” Conway (2010) summarized it
the prospects of data analytics. Further, this
as the intersection of hacking skills, mathematics
exuberance is leading to a rapid refashioning
& statistics combined substantive expertise.
towards data analytics, up to and including the
Conway’s (2010) Venn diagram elegantly and
development of new programs. We could see a
colorfully draws the boundaries between machine
situation where no two Data Analytics degrees are
learning, traditional research, data science and
the same as the tenets of the discipline are not
the danger zones. Figure 1 is a testimony to the
focused, defined, or agreed upon. In many cases,
many intersections of the data analytics
this problem persists with IS discipline at large.
disciplines. For example, a self-driving car or a
pattern recognition system are examples of This leads to questions regarding who “owns” the
machine learning; however an recommender data analytics topic. In academia, we can posit

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 23
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

that one common point of contention in academic  Data consolidation and exploratory data
institutions these days is the issue of ownership analysis is challenging and may require a
and where should the different courses be variety of tools (R, Python, SQL, Excel, Excel
housed. We can even ask where interdisciplinary pivot tables and tableau).
programs such as text analytics and social media  Multi-tasking and high task saturation
analytics should be housed. In the Kuhnian commonly accompanies the above steps in an
(2012) sense: the ebb and flow, and evolution iterative nature where jumps among the steps
and emergence of disciplines is per paradigmatic are common.
shifts. Disciplines flourish or flounder according  We may also think that (which is most likely the
to need and environment, but also fad and case) that a transaction processing system
fashion. Thus the ownership question remains. provided input data to the recommender
Whether computer science calls it machine system.
learning or data science, or mathematics-
Thus, actionable results are not simply a matter
statistics call it data science or data analytics, or
of firing up a data analytics computing engine,
IS programs call it data analytics or business
connect to data, and display tidy results on a
analytics, we acknowledge the fact that although
dashboard. Rather, it is a matter of integration.
the different disciplines have the same concerns,
Classic problems related to software engineering
they have different focuses.
and IS development are readily evident. Basically
a data analytics project is just another software
Fundamentals
engineering project where different people with
We now examine the nature of the concerns,
different skills work together to produce a
trends, emerging disciplines, and other
software product.
phenomena surrounding the rush to data. We
consider data analytics in terms of its first This narrative gives way to the central theses of
principles, and contrast and compare these first this paper:
principles to those of IS.  What is data analytics?
 What does data analytics hold for IS, and vice
Let us reflect on the circular relationship between
versa?
transaction processing systems and analytic
 What makes data analytics different? (we
systems. To build transaction processing
already transitioned through the knowledge-
applications, data, business logic, reporting and
engineering era and phasing-out the data
presentation are core concerns. To build
warehousing era)
analytical processing applications, data,
 How have colleges and universities met the
algorithms, model validation, discovery, insights
growing demand for data analytics skills?
and presentation are core concerns. Further, we
 What is a good IS approach to incorporate data
can characterize transaction and business
analytics into the curriculum?
processing algorithms as a transparent and open
 What have we learned from early experiences?
box, whereas analytic processing can be seen as
a translucent black box. In both cases, core These questions come full circle to our context of
concerns remain data design, gathering, IS education as we must decide how, and to what
repurposing, conceptualizing, storing, retrieval, extent, will data analytics pervade the discipline.
manipulation and presentation. Some common There are even questions related to the
present day use cases come to mind which belie appropriate level for engaging data analytics.
the complexity and systems knowledge required Should data analytics gravitate more towards a
to function effectively. Take the case of an graduate-level concern? At the graduate level,
association Rules (Recommender) System: given the demographics of the students, we may
 Using R (programming language and software focus on skill building and expanding the
environment for statistical computing and boundaries of their technical skill knowledge,
graphics) and its packages, prepare the data so contrasted with their foundations in a given
it is a transaction. subject matter area of expertise (medical,
 Clean up the data to establish non-duplicate financial, marketing, education, learning, etc.). At
items in a basket. Try to interpret the results. the undergraduate level, the needs of
 Grapple with incompatibilities between package foundational systems development topics may
versions and R-Versions as the project evolves. “crowd out” the data analytics topics such that
 Often, the data is poorly organized and error- the subject may not be feasibly or fully explored.
prone as given/found/extracted. Null values, Concomitantly, there is also a requisite level of
type mismatches, and other data quality issues intuition, driven by tacit knowing and acumen in
may take hours or even days to correct. subject matter expertise wrought through long-
term exposure to data, necessary for success.

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 24
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

This is evident from the diversity of the degree courses, pulled as many syllabi as we could, read
programs in the area. and compared their course descriptions. We
summarized and compared their course contents.
2. ARTICULATING A DATA ANALYTICS Although all of these programs share a common
CURRICULUM FOR IS theme, the structure of the curriculum, the course
requirements, prerequisites, course descriptions,
As there are heterogeneous inputs to and computing technologies, focus and depth of
consumers of data analytics, it is natural for many offering widely varied. We offer this as an
disciplines to become involved with data analytics indication of a discipline in flux and under
and otherwise appropriate its benefits to suit their formation. The findings of this analysis are shown
ongoing core dialog. As such, given the nascence in Figure 2. Note that the MIT Sloan School of
and emergence of the “field” it is difficult to know Management just introduced a One-Year Masters
where data analytics should call home. Or, should of Business Analytics program that is modular in
we accept it as a new discipline, as was afforded nature (MIT-Sloan 2016). We were not able to
to IS at one time. collect complete descriptions of the Modules
(beyond the Optimization Methods, Intro to
As such, degrees in data analytics are elusive as Applied Probability, Data Mining, the Analytics lab
they cross-cut multiple concerns (Figure 1). Upon courses and the Analytics Edge which is also a
early inspection, one would find that there is no coursers.org course). Since we did not have
uniformity in the course offering of such degrees complete information about the degree program,
across Universities (content, course descriptions, we opted not to include it in the graduate
requirements, prerequisites and transferability). programs list.
Whereas more established disciplines -
Marketing, Finance, Computer Science, and From our review of these graduate programs, the
Political Science - have established a reasonable following curricular patterns emerge.
degree of consistency in their curricula.  Graduate programs typically require a total of
30-36 credit hours as follows: 21-24 core
While data analytics is certainly consistent with credits plus 6-9 approved elective credits and a
most Universities’ mission statements, there are 3-credit Capstone Course.
no clear imperatives that compel data analytics to  There is no uniformity of offerings across the
locate in one area or another. Furthermore, even undergraduate programs, some programs just
the modalities of delivery are neither suggestive rebranded and renamed their data warehousing
nor limiting. Accordingly, digital learning, on-line ETL courses into Big Data-I and Big Data-II, or
programs, graduate programs and service to the the Business Statistics courses into Data
local community – to attract professionals, Analytics-I and Data Analytics-II.
continuing and life-long learners, or even alumni
– are all also neither prohibitive nor suggestive. Students from a computing discipline such as
This paper’s authors attempt to articulate a Math, CIS, CS, and Statistics may have 6
proposal of what a master’s program in data approved credits waived if they have completed
analytics might look like. This section will proceed the equivalent course work with a B grade or
to share the broad strokes of the proposal and higher. Candidate courses are Programming,
otherwise share our discoveries. Data Management and Statistical Data Analysis.
The courses for the degree are grouped into five
Program Modalities separate categories as follows:
Increasingly, accommodating a wide variety of  Core Data Management: 3 Courses
delivery modalities is necessary. Thus, any of the  Core Statistical Data Analysis: 2 Courses
following modalities is recommended:  Core Data Analytics: 2-3 Courses
• An on the ground program,  Core Capstone Course: One Course
• An On-line program,  Electives: 2-3 Courses
• A hybrid of on-line and on the ground program.
Pre-Requisite Knowledge
Proposed Degree Requirements Fundamental prerequisite knowledge from an
For the purpose of this paper, we surveyed a undergraduate (or graduate if 2nd masters)
representative sample of more than 13 graduate education would include knowledge of
programs and 8 undergraduate programs that programming, statistics, and calculus. For
offer degrees in Data Analytics, Business students who do not have the background
Intelligence, or Business Analytics. We observed knowledge, they would either take leveling
that there is more uniformity in the course courses or use identified bridging courses which
offerings at the graduate level as compared to the may include approved Massive Open Online
undergraduate. We compared the content of their Courses (MOOC) offerings where a certificate of

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 25
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

completion satisfies the prerequisites that can be 2.2 Statistic-II for Data Analytics
obtained. Pre-req: Statistics-I

The Graduate Education Component Description: The different types of Regression


Models and Time Series forecasting. The course
In this section we explicate a provisional design emphasizes statistical computing.
for a graduate curriculum in data analytics. Technology: Excel, R and R-Packages.

1. Three Core Data Management Courses 3. Three Core Data Analytics


3.1 Principles of Data Analytic
1.1 Programming for Data Analytics Pre-req: Stats-I, Data Management
Pre-req: Basic Programming Knowledge Co-req: Data Management
Description: Reading data from different data Description: Exploratory Data Analysis,
sources & streams such as files, web-searching, Statistical Models, Classification and Prediction,
etc. Organizing, manipulating and repurposing Clustering Analysis, Similarity Measures,
data. Parsing-in and storing-out data JSON Fitness of Models, learn how to use machine
formats, string manipulation using RegEx learning technologies to analyze data for the
libraries, Sets, Arrays and dictionaries. purpose of decision support.
Technology: Python, RegEx, DOM, JSON. Technology: Excel, R and R-Packages.
3.2 Advanced Data Analytics
1.2 Data Management for Data Analytics Pre-req: Principles of Data Analytics, Stats-II
Pre-req: Programming
Description: Optimization Models, Social media
Description: Fundamentals of sound database analytics, Text analytics, advanced Analytics
design, storing, manipulating & retrieving data, Algorithms like SVM, Neural Networks,
Conceptual Data Modeling, SQL, Functional bootstrap models, Model Validation.
Dependencies, Data Normalization.
Technology: R, R-Packages, Gephi and NodeXL.
Technology: SQL, MySQL DBMS, and ER-
Modeling tools. 3.3 Big Data Analytics
1.3 Data Visualization for Data Analytics Pre-req: Advanced Data Analytics
Pre-req: Programming Description: Big Data Meets Data Science,
Co-req: Statisitcs-1 Meets Data Analytics. Map-Reduce Model,
Description: Provide an understanding of the Apache Spark, Mongo DB, NoSQL Model and
different data types and their encoding Scale out Models.
schemes. Learn data visualization principles Technology: Cloud Computing, Apache Spark
and Mantra(s). Learn how to tell a story through and MongoDB.
data. Learn how to detect insight through data,
learn and utilize current technologies to 4. Capstone Course
visualize large data sets for the purpose of
providing insight.
4.1 Capstone Course
Technology: Excel, Tableaux, Web-GL, D3.JS, Co-req: Big Data Analytics
R-ggplot and R-Shiny.
Description: Special Topics in Business
2. Two Core Statistical Data Analysis Analytics with a comprehensive real world
project, the project usually extends over 2-
2.1 Statistics-I for Data Analytics terms, a summer and the fall or the spring
Pre-req: Business Calculus and the summer. Although the topics of the
course are taught by one faculty and during
Description: Probability distributions and their
one semester, a student might be working
applications (geometric, binomial, Poisson,
with another faculty from the program or
uniform, normal, exponential, t, F and Ki-
from an approved discipline on their project.
Squared), Sampling Statistics, Confidence
The project may be in collaboration with a
Intervals, Hypothesis Testing, Analysis of
partner organization or a business.
Variance and Linear Regression Modeling
Technology: Whatever is needed for a
Technology: Excel, R and R-Packages.
successful project portfolio.

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 26
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

5. Three Elective Courses (More team’s mentor (a coauthor of this paper) provided
electives can be added) guidance and acted as a chief architect, coach,
and project manager for the team. At the
5.1 Legal and Ethical Issues of Data competition, the students presented their poster
Prereq: Principles of Data Analytics to a team of judges. Subsequently, after this
initial round of judging, the students are given a
Description: Provide an understanding of the
new related data set that the team has not seen
legal and ethical issues as it relates to data
previously. Each team then had twelve hours in a
storage, retrieval access and sharing as well
reserved space to analyze the new data set to
as analytics models. Copyright law, legal and
develop new insights, prepare and submit a
ethical ramifications are at issue when
presentation the next morning (No
recommending choices that have economic,
communication with the mentor was allowed).
social, environmental or legal impacts.
The team would then deliver a ten-minute
5.2 Project Management presentation to a team of judges. Both the new
Prereq: Principles of Data Analytics analysis and preparation must have been
completed within the twelve-hour overnight
Description: A data analytics project is just period. There were 18 teams competing from
another software engineering project that will various North American universities.
culminate in a software product.
Deliverables, artifacts, timelines and The crux of the competition is to use the given
resources need to be tracked and managed. data, and the context of the company providing
Technology: Project Management Software. the data, to bring technical solutions to business
problems. During the 10-weeks period, the
5.3 Decision Modeling students and mentor approached the problem as
Prerq: Statistics-II though it were an IS development, project which
Description: Understand, formulate, solve requires: planning; skills development;
and analyze optimization problems in the computing resources; produce artifacts and
business domain and its operations. Utilize deliver a product. The team of interest consisted
excel functions and macros to perform what- of juniors and seniors, three members in total.
if analysis. Understand the geometric Collectively, the team demonstrated sound
interpretation of linear optimization business, analytical, technical skills and
problems, Use solver family of packages to dedication. Salient to the competition, two
solve linear optimization problems. Formulate students were enrolled in a data mining course
and solve Network graph types of problem during the semester of the competition. Relevant
especially in the social media domain. course topics included data mining applications,
algorithms and technologies (using R, R
Technology: Excel, Solver, Gephi. packages, and Tableau). Further, at the start of
the project, the team was already familiar with
In figure 4 a visual of the course sequences and Python and its APIs, Data Modeling and SQL, SAS,
their dependencies is presented. R, Tableau, and everyone knew Excel very well.
From an Analytical perspective, the team was
3. TESTING THE ASSUMPTIONS – LESSONS familiar with data modeling concepts, business
LEARNED FROM TWO CASES statistics, statistical modeling, Finance,
Marketing, etc.
Student Data Analytics Competition Case
As the project progressed, the team’s depth of
We continue our inquiry with a case description
knowledge matured as well as an increasing
that illustrates some of the points made thus far
awareness that more concepts, knowledge, and
about the nature of a data analytics curriculum
technologies were yet to be learned.
and its relationship to the IS curriculum. The case
Concomitantly, as their semester in data mining
is situated about a student data analytics
wore on, the team acquired more in-depth
competition held in New York City in the spring of
knowledge of R’s data mining packages, ggplot (a
2016. The objective of the two-day student
graphing and plotting package for R), and
competition was to provide insights into a data-
Tableau (for visualization and visual analytics)
set that was made available from an online
Furthermore, the students learned SQL-Server
discount retailer that aggregates luxury brands
Analysis Services Business Intelligence platform
and offers them at discounted prices. The student
on the side and heavily utilized its capabilities into
team had ten weeks to produce a poster that
their analysis.
explains the nature of the data, the problems that
surround it, and their insights into the data. The

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 27
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

The poignant aspect of this case is the wonderful Next, the team was aware of a blend of
outcome whereupon this team won first prize in competencies that we can generalize away from
the competition. Thus, we present the case not the competition setting and to several emergent
only to characterize and celebrate the student concerns. The team understood the importance
accomplishment, but to highlight what can be of APIs – addressing them, consuming them,
learned from this experience on these main integrating their outputs into their analysis
grounds: system, understanding how API inputs could
 What can we generalize from the experience emanate from their system, etc. Often, these
about a successful data analytics project? APIs are consumed through calls to REST services
 What are the implications of these where JSON is the carrier of the data. The team
generalizations on a data analytics curriculum? understood the general processes surrounding
 How can IS the discipline at large, incorporate software and systems development as well as the
data analytics into the curriculum? concepts, tools and techniques of data analytics.
 Lastly, what does IS have to offer to data For modeling, statistics were important as well as
analytics? the modeling techniques inherent in data mining.
Finally, business acumen – an awareness of the
Upon reflection, the ingredients of the students’
business context of the problem and the need to
success can be attributed to the mix of student
integrate business-oriented decision-making into
talent, mentoring, technology infrastructure,
and out of organizational management IS – was
competency in underlying computing technology,
a significant success factor. Through the poster
and the utilization of data analytics skills with an
and the presentation, they were able to tell a
eye for the foundational business problem. That
story, reveal patterns and provide insights.
is, the combination of these factors put these
students in a good position to prevail. Some of the final lessons from this case pertain
to curricular and disciplinary concerns. In a
As we generalize from the student’s experience,
previous section, the outline for a graduate
it is important to note that their accomplishment
curriculum in data analytics was supported by the
was entirely their own work. Grappling with the
outcomes of the case. The curriculum for data
new data set, the twelve-hour overnight drill, the
analytics must blend data management,
insights produced, and the quality of the
traditional system development, data and
presentation was accomplished with no input
statistical modeling, and the business context. As
from the mentor. While orientation, training,
is the case with computing, software, and
coaching and mentoring all occurred in the lead
systems, these elements span a spectrum where
up, the student team ran the relay race.
a universal curriculum of data analytics may not
coalesce cleanly. Some of these concerns may
Lessons from the Case
lean considerably into traditional computer
While the lessons from the case are myriad, a
science and software engineering and yet others
standout lesson certainly lies in the blend of skills
may lean towards the spaces occupied by IS.
utilized, the quality of the individuals involved,
Additionally, some of the modeling may lean
and the overarching perspective assumed
toward operations research. In the discussion,
throughout the competition.
we shall elaborate further on how the bridging
First, the success of the project can be primarily and spanning nature of the IS discipline are well-
accorded to focus on utilization of the correct suited to provide leadership as data analytics
computing tools underscoring the true nature of emerges as an ongoing concern.
a data analytics project – it is still largely a
software and systems problem. Knowledge of Twitter Text Analytics Case
computing tools to analyze, clean, prepare and The second case is an exposition on the
repurpose the data were required in addition to requirements for a twitter text analytics research
knowledge about the nature of data. project that culminated in the case being
Furthermore, the data is never encountered in a published in The Case Research Journal of the
perfect state: that the data requires review, North American Case Research Association. This
cleaning and preparation. is shared to highlight typical challenges faced
when undertaking a type of research project
Further, a business and subject-matter typically used to characterize the power of data
orientation was adopted, which is often analytics. We highlight the challenges in what
characteristic of many contemporary software seems, on the surface, to be a simple and
and systems approaches – early and iterative straightforward endeavor – using well-
delivery of working artifacts in close cooperation documented APIs to bring Twitter data into an
and partnership with stakeholders. application or data analysis context. It is a

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 28
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

common misconception that Twitter data are Case Details


ready, as is, for text and social networks We continue with a discussion on how the Twitter
analytics. Some software packages like NodeXL data may be used in a typical data analytics
(Hansen 2011), attempt to provide easy and context. In the project, we used Twitter data to
straightforward access in the social network of a analyze a crisis situation in a market-research
tweets dataset. However, in our experience, the study. We needed to have the full data set, we
matter is not simple unless the requirements for purchased 53,900 tweets from Gnip (a twitter
analysis are superficial. That is, repurposing subsidiary) that comprehensively covered 40
Twitter data sets for analytics requires a wide consecutive days of the hashtags “XXXXX” or
variety of skills. Skills in programming languages “YYYYYY”. During the analysis, we learned more
like Python, in statistical packages like R and text about the twitterers, their agendas, and their
analytics packages like R-tm and R-weka. narratives. The primary twitterer, and the creator
Occasionally web-development skills are required of one of the hashtags that went viral is Mr. John
for DOM scrapping of web-pages. This does not Dow. In this study, we wanted to know what John
even account for the set of skills required to build Dow’s social media interests on Twitter were the
effective systems for reporting, integration, and day before he brought these hashtags to life. The
other interactions required to make use of this challenge lies in the fact that the Twitter timeline
knowledge. Commonly, Twitter allows access to API allows access to a limited number of historical
their data in multiple modalities: tweets from a user’s timeline and Mr. DOW is an
1.Through the web-interface: Users search avid twitterer the limited number of historical
content based on screen-name, hashtags, tweets was insufficient. Other than purchasing
mentions, body of text, etc. They can read the content, it seemed that we could not answer
content, take notes, print pages, type counts that question. Initially, our only option was to
(favorite, retweet, etc.) into an excel scrape the data from twitter.com search. Using
spreadsheet. Content is not captured into Google-Chrome with the advanced search option,
structures in digital formats where it is easy to we were able to display the tweets on the web-
manipulate. This has disadvantages based on page. With Chrome Inspector, we pinned the
how cumbersome this process is when large content and saved it as html text document.
datasets are desired. Using Python, we wrote code that parsed the
content of the file for the id(s) of the tweets, and
2.Through the Twitter-API(s): These are well then we used the Twitter API functions we
documented and afforded programmers with iterated through the id(s), extracted the full
extensive real-time access to Twitter’s data. For content of the tweets. We were then able to
instance, a programmer may use Python to perform the needed text analytics for the paper.
retrieve JSON data structures which can be
passed on to R for analysis. Some of the Lessons from the Case
challenges with this modality would be learning The most poignant lesson to emerge from the
the protocols of the Twitter-API, communicate, case described above is the issue of federation.
request and store the content. It is important Data will not always cleanly answer questions,
to note that these are significant computing and data analytics fundamentally exists to
skills that need to be acquired in order to utilize facilitate questioning the leads to information and
this modality. decision-making. There are two important
“gravities” here: the need to work with computing
3.Subscription and purchase: Based on usage tools for data manipulation, and the need for
agreements, users can purchase content from discretion in subject-matter expertise that
Twitter subsidiaries. Usually content is informs the extent to which we will strive for the
delivered in file fragments where each file may needed data. These worked together in the case
contain a bundle of tweets in JSON format plus in that the subject-matter was not the social data,
some other trailing file meta-data that needs to but the knowledge of how data can be obtained
be cleaned-up. These file bundles often need to and maintained.
be combined (and cleaned-up) in order to be
useful. 4. CONCLUSION

The implication here is that the data will not move In summary, to conceive of and design a Master’s
into and out of our data analytics systems without of Science in Data Analytics is to engage an
curation and coaxing. emergent and multi-disciplinary phenomenon.
We must embrace multiple imperatives and
paradigms: the objective of which is for students

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 29
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

to master, at least, the tools and techniques of Finally we even created a poster that highlights
both data science methods & their computing. the different aspects of the degree program
(Figure 5).
This objective requires addressing knowledge
areas from multiple disciplines. Data Analytics
5. REFERENCES
represents the intersection of Computing,
Statistics, and other disciplines from Humanities,
Asamoah, D., Doran D., Schiller, S. (2015).
Business, Science, Bio-Informatics, Learning
Teaching the Foundations of Data Science: An
Analytics, Natural Language Processing, Social
Interdisciplinary Approach. Pre-ICIS SGDSA
Networks Analysis, etc. At the same time, a data
Workshop. Retrieved Aug 15, 2016 from
Analytics project is a software engineering project
https://works.bepress.com/daniel_asamoah/
that requires different people from different
14/
disciplines with different skills working together
to build a data product that is useful, usable and Bastian M., Heymann S., Jacomy M. (2009).
maintainable. Although we do not expect Gephi: an open source software for exploring
everyone in the project to be as skillful, and manipulating networks. International
knowledge of analysis and design and project AAAI Conference on Weblogs and Social
management principles is essential. Media.
To build data products and to transfer data into Chiang, R., Goes, P., & Stohr E. (2012). Business
insights that can be utilized, data engineers take Intelligence and Analytics Education, and
large data sets from different sources that need Program Development: A Unique Opportunity
to be repurposed, cross-referenced, analyzed and for the Information Systems Discipline. ACM
presented. The same project might require Transactions on Management Information
multiple computing machinery and machine Systems 3(3). 12:1-12.13
learning algorithms to exploit the underlying
Collier, K. Carey B., Grusy, E. & Marjaniemi, C.
structures of the data. Exploratory data analysis
(1998). A Perspective on Data Mining.
requires data visualization tools and dashboards,
Retrieved Dec. 18, 2016 from:
statistics and machine learning algorithms for the
http://www.insight.nau.edu/downloads/DM
purpose of finding patterns that can be exploited
%20Perspective%20v2.pdf
within the underlying disciplines. Different data
types require different types of repurposing Conway, Drew (2010). The Data Science Venn
techniques, analysis and discovery methods. Diagram. Retrieved Aug 20, 2016 from
Numeric data are statistical, social networks data http://drewconway.com/zia/2013/3/26/the-
are graphs, trees and associations’ centric, data-science-venn-diagram
textual data require natural language processing,
named-entity recognitions and sentiment Guo, Y., (1998). Data Mining: Theory and
analysis. Being able to insure a reliable computing Practice. Retrieved Dec. 20, 2016 from
and hardware infrastructure is a challenge. http://www.doc.ic.ac.uk/~yg/course/dmml/

In conclusion, we hope that we made the case Hansen, D., Shneiderman, B., & Smith, M.
that a Master’s degree in data analytics is the (2011). Analyzing Social Media Netowrks with
right place for IS educators start the assimilation NodeXL. Elsevier Inc.
of the data analytics phenomenon into their Hirschheim, R., & Klein, H. K. (2012). A glorious
discipline, departments, and programs. The and not-so-short history of the information
masters level is where the most holistic picture of systems field. Journal of the Association for
what data analytics will mean for our discipline Information Systems, 13(4), 188.
and its premise. We also hope that we provided
a template and road map of a model curriculum Goharian, N., Grossman, D., & Raju, N. (2004).
for a Master’s degree in data analytics based on Extending the undergraduate computer
our experience, research of the programs we science curriculum to include data mining.
analyzed. However, at the undergraduate level an Proceeding of the International Conference on
extended minor or a co/dual-major Information Technology: Coding and
(Programming. Data Management, Advanced Computing, 2, P.251
Statistics, Principles of Business Analytics plus Jafar, M. J., Anderson, R. R., & Abdullat, A.
another elective) gives undergraduate students (2008). Data mining methods course for
reasonable amount of knowledge and awareness computer information systems students.
of the application of data analytics into their Information Systems Education Journal,
area(s) of specialization. 6(48).

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 30
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

Kuhn, T. S. (2012). The structure of scientific O’Neil C., & Schutt, R. (2014). Doing Data
revolutions. University of Chicago press. Science. Oreilly, USA.
Lopez, D. and Ludwig, L, Data Mining at the Patel, N. (2003). 15.062 Data Mining, Spring
Undergraduate Level, Proceedings of the 2003. Retrieved Dec. 15, 2016 from:
Midwest Instruction and Computing http://ocw.nur.ac.rw/OcwWeb/Sloan-School-
Symposium, 2001. of-Management/15-062Data-
MiningSpring2003/CourseHome/index.htm
Musicant D. R. (2006). A data mining course for
computer science: primary sources and Simon, P. (2013). Too Big to Ignore: The
implementations. SIGSE ’2006 Proceedings of Business Case for Big Data (Vol. 72). John
the SIGCSE technical symposium on Wiley & Sons.
computer science education.
Terri L. Lenox (2002) Development of a Data
Lyytinen, K., & Yoo, Y. (2002). Ubiquitous Mining Course for Undergraduate Students.
computing. Communications of the ACM, ISECON-2002.
45(12), 63-96.

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 31
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

Appendix

Figure 1: (Conway 2010, Germiger 2014) Data Analytics is Cross Cutting.

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 32
http://iscap.info; http://isedj.org
Page 33
September 2017
15 (5)

_________________________________________________
__________________________________________________________________________________________________________________________

Figure 2. Topic map against existing programs, the darker the color the heavier the emphasis
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Proposed Columbia CUNY Fordham NYU Stevens CMU UCB NCSU Stanford N. Western York G. W UMUC
Body of Knowledge Data Anal Data Sc Data Anal Bus Anal Bus Anal BI & Anal BI & Data Anal Data Sc Analytics IM & Anal Pred Anal Bus Anal Bus Anal Data Anal
Probaility Theory

©2017 ISCAP (Information Systems & Computing Academic Professionals)


Statistics
Advanced Statistics
Decision Modeling&Algorithms
Programming
Statistical Programming
Advanced Programming
Data Management
Data Visualization
Data Mining
Advanced Data Mining
Big Data Analytics
Social Networks Analysis
Text Analytics
Web Analytics
Information Systems Education Journal (ISEDJ)

Machine Learning
Capstone
Ethics of Data
Data Warehousing
Project Management
Prior Knowldege

http://iscap.info; http://isedj.org
Calculus
Linear Algebra
Programming
Statistics
elective
ISSN: 1545-679X
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

Figure 3. Differentiating the Problem Space


(http://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article)

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 34
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

Programming
Co-req Data Visualization Co-Req Statistics-I
For Data

Principles of Data
Data Management Co-req Statistics-II
Analytics

Advanced Decision
Data Analytics Modeling

Big Data Project


Analytics Management

co-req

Capstone Legal and


Course Ethical Issues

 Co-requisite courses can be taken concurrently


Figure 4. Course Sequences and Dependencies

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 35
http://iscap.info; http://isedj.org
Information Systems Education Journal (ISEDJ) 15 (5)
ISSN: 1545-679X September 2017
__________________________________________________________________________________________________________________________

Figure 5. The Poster

_________________________________________________
©2017 ISCAP (Information Systems & Computing Academic Professionals) Page 36
http://iscap.info; http://isedj.org

You might also like