0% found this document useful (0 votes)

587 views40 pages

08-MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 1

Uploaded by

Aamir Reza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

587 views40 pages

08-MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 1

Uploaded by

Aamir Reza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

MBA – DATA ANALYTICS

SEMESTER-I

DATA SCIENCE AND BUSINESS

ANALYTICS
MB-DA-104
All rights reserved. No Part of this book may be reproduced or transmitted, in any form or by
any means, without permission in writing from Mizoram University. Any person who does
any unauthorized act in relation to this book may be liable to criminal prosecution and civil
claims for damages. This book is meant for educational and learning purposes. The authors
of the book has/have taken all reasonable care to ensure that the contents of the book do not
violate any existing copyright or other intellectual property rights of any person in any
manner whatsoever. In the event the Authors have/ have been unable to track any source and
if any copyright has been inadvertently infringed, please notify the publisher in writing for
corrective action.

© Team Lease Edtech Pvt. Ltd.

All rights reserved. No Part of this book may be reproduced in any form without permission
in writing from Team Lease Edtech Pvt. Ltd.
CONTENT

UNIT - 1: Data Science 4

UNIT - 1: DATA SCIENCE

STRUCTURE

1.1 Learning objectives

1.2 Introduction

1.2.1 What is the meaning of Data?

1.2.2 More insights about Data Science

1.2.3 Types of Data

1.3 Machine Learning

1.3.1 Supervised and Unsupervised Learning

1.3.2 Predictions and Forecasts

1.3.3 Innovation and Experimentation

1.3.4 Big Errors

1.3.5 Privacy

1.4 Theories, Models, Intuition, Causality, Prediction, Correlation

1.5 The Very Beginning: Got Math?

1.6 Exponentials, Logarithms, and Compounding

1.7 Normal Distribution

1.8 Vector Algebra

1.9 Open Source: Modelling in R

1.9.1 System Commands

1.9.2 loading Data

1.9.3 Root Solving

1.9.4 Regression

1.9.5 Heteroskedasticity

1.9.6 Auto-regressive models

1.9.7 Vector Auto-Regression

1.9.8 Prediction Trees

1.10 Summary

1.11 Self-Assessment Questions

1.12 Suggested Readings

1.1 LEARNING OBJECTIVES

After studying this unit, you will be able to:

● Obtain, clean/process, and transform data

● Analyze and interpret data using an ethically responsible approach

● Use appropriate models of analysis and assess the quality of input

● Derive insight from results and investigate potential issues

● Formulate and use appropriate models of data analysis to solve hidden solutions to
business-related challenges

● Interpret data findings effectively to any audience, orally, visually, and in written
formats

● Understand and implement the concept of machine learning

1.2 INTRODUCTION

George E.P states that “All models are wrong, but some of them are useful.” So what exactly
is “data science?” You must have heard the term many times from various people. Though
there seems to be no exact definition of data science, a lot of people have coined definitions.
Before we begin, let us understand a few basic concepts.

1.2.1 What is the meaning of Data?

The word data is of Latin origin and the literal meaning of this word is anything and
everything that is given. Data can be from a single source or it might have a lot of sources.
Over time, the use of data was seen in a lot of places, and each use defined and put forth a
different meaning of this word. One of the most used and proven definitions of data is
“something given or admitted.” Anything that is a conclusion of a study, observation,
analysis, or occurrence.
UNESCO has also defined data and according to them, data can be defined as concepts,
facts, and illustrations in a formal manner. This formal illustration is suitable for
interpretation and communication, or it can even be processed by humans or through
automatic methods.

But that's not the end of it. Overtime requirements have changed and data has taken a special
place in society. From the smallest of spaces to huge rooms, data covers a lot of details,
materials, specifics, proofs, things, and places. Data has led to the evolution of data science
(the study of data), and data science has led to technology such as Big Data. In today's world,
big data is used to run businesses, and hence it is important to understand and learn this
concept. The increasing use of the internet and social media has made data an integral part of
our lives. The internet and social media platforms have created many channels of
information. It has not just increased the scope of business and work but has also played an
important role in generating more data. The use of such data has been increasing ever since.
Be it YouTube, Twitter, Instagram, or any other platform, data creation, collection, and
analysis are everywhere.

But you can’t simply say that everything around us is data. It is just a piece of information
until you add and integrate it into a complete study. Before the integration of analytics, data
is nothing more than just noise. And this chapter is all about helping you identify the
difference between data and information. It also plays an important role in helping you
understand how important it is to have a complete model and theory of things. It is already
proven that we live in a world where data is everything. If you want to convert information
into data accurately, remember that it is all about application and analytics. The data you use
for your business, studies, or any other purpose is based on well-founded theories and expert
judgments that lead to conclusions, and convert noise into data that can be used for business
growth, learning, and much more.

1.2.2 More insights about Data Science

With extensive network and usage, data has become very important to mankind. Therefore,
data science, the study, and collection of data, has also become very important. For quite
some time now, data science has been transforming businesses. Various companies have
been using not just normal data but medical data to gather insights and important information
about individuals. The CODATA Task Group on Accessibility and Dissemination of Data
(CODATA/ADD) founded in 1975 states that the need for data categorization arose due to
the variety of data available. This group has also developed several methods to classify data
into different categories.

1.2.3 Types of Data

Data is categorized under different types as shown below.

Figure 1.1

According to the officials of the CODATA Task Group on Accessibility and Dissemination
of Data (CODATA/ADD), the various types of data according to science are:

● Data according to time

Figure 1.2
There is a lot of data that can be categorized with time as its main factor. Time being the
main factor, there are broadly two types of data that can be defined. One is the Time
Independent Data and the second one is Time-Dependent Data. Now let's brief both these
parts.

Figure 1.3

● Time independent data:

Time independent data can be understood by its name. It can be defined as the data that’s
independent of time. Now to understand this we can take examples of geoscience and
astronomy. In both these sciences, time is not very important as both these departments deal
with rocks, stars, and geological structures.

● Time-dependent data

The next category is time-dependent data where things happen once in a long time and their
recurring value is very rare. Since a recurrence is infrequent, time is a very important factor.

● Data WRT location as a factor

Location-independent data:
You might have studied different concepts in Physics and Chemistry. All of those things/
concepts are based on analytical data and are independent of location. This is how science
defines data independent of location.

Location dependent data:

On the other hand, there is a lot of data under this category. This data is dependent on
location. Everything you study in astronomy, earth science belongs to this category. Also,
people who study rocks and their composition, or how old a particular rock is, are gathering
data that is location-specific. Hence, such data is location-dependent.

● Mode of generation data

Now there is another category of data defined by science, and for data scientists, it is very
important to not just understand it, but also make use of this category in their work and
studies. Data is also defined and categorized based on the mode of generation. There are
three modes of data generation. These are listed below:

1. Primary Data - As the name suggests, primary data is obtained from observations and
experiments. These are taken from the values determined by the experiment based on
samples. For example, time, length, or velocity; all of these are primary data derived
after conducting experiments.

2. Derived Data - Then comes the derived data. In many areas, derived data is also
known as reformatted data. It helps combine theoretical models to conclude. As the
name suggests, derived data is concluded from a set of observations.

Figure 1.4
3. Predicted Data - Theoretical data is also known as predicted data and is derived from
various theoretical calculations. In studies of this kind, data is derived from several,
rigorous calculations. The basic or fundamental constants of predicted data are used
in calculations e.g., all the data that is calculated, or data that involves celestial
mechanics.

Figure 1.5

There are many "Vs'' in big data: three of them are volume, velocity, variety. Big data
approaches the computing ability of typical databases. This is the element of volume. The
size of the production of data is mind-boggling. Google's Eric Schmidt found out that until
2003, just five Exabyte’s of data had been produced by all human beings (an Exabyte is
10006 bytes or a billion-billion bytes). Today, every two days, we produce 5 Exabyte’s of
data.

The key explanation behind this is the proliferation of "interaction" data, a modern
development in comparison to "transaction" data. Interaction data is generated from tracking
events in our increasingly interactive everyday lives, such as browsing operations,
geolocation data, RFID data, sensors, personal digital recorders such as fit bits and tablets,
satellites, etc. We're living in the "Internet of Things" (or IOT) today, and it's generating vast
volumes of info, all of which we seem to have an endless desire to examine. It's easier to
learn about 4 Vs of large data in other quarters.

Figure 1.6

A good data scientist will be adept at managing volume not just technically in a database
sense, but by building algorithms to make intelligent use of the size of the data as efficiently
as possible. Things change when you have gargantuan data because almost all correlations
become significant, and one might be tempted to draw spurious conclusions about causality.
For many modern business applications today, extraction of correlation is sufficient, but good
data science involves techniques that extract causality from these correlations as well.

Figure 1.7

In many cases, detecting correlations is useful as is. For example, consider the classic case of
Google Flu Trends, see Figure. The figure shows the high correlation between flu incidence
and searches about "flu" on Google, see Ginsberg et. al. (2009). Searches on the keyword
“flu” do not result in the flu itself! Of course, the incidence of searches on this keyword is
influenced by flu outbreaks. The interesting point here is that even though searches about flu
do not cause flu, they correlate with it, and may at times even be predictive of it, simply
because searches lead to actual reported levels of flu, as those may occur concurrently but
take time to be reported. And searches may be predictive, the cause of searches is the flu
itself, one variable feeding on the other, in a repeat cycle. Hence, a prediction is a major
outcome of correlation, and has led to the recent buzz around the subfield of "predictive
analytics." There are entire conventions devoted to this facet of correlation, such as the
wildly popular PAW (Predictive Analytics World). Pattern recognition is in, passed causality
is out.

Data velocity is accelerating. Streams of tweets, Facebook entries, financial information, etc.,
are being generated by more users at an ever-increasing pace. Whereas velocity increases
data volume, often exponentially, it might shorten the window of data retention or
application. For example, high-frequency trading relies on micro-second information and

streams of data, but the relevance of the data rapidly declines.

Figure 1.8

Finally, the variety in data is much greater than ever before. Models that relied on just a
handful of variables can now avail of hundreds of variables, as computing power has
increased. The scale of change in volume, velocity, and variety of the data now available
calls for new econometrics, and a range of tools even for single questions. This book aims to
introduce the reader to a variety of modeling concepts and econometric techniques essential
for a well-rounded data scientist.

Data science is more than the mere analysis of large data sets. It is also about the creation of
data. The field of "text-mining" expands available data enormously since there is so much
more text being generated than numbers. The creation of data from varied sources, and its
quantification into information is known as “datafication.”

1.3 MACHINE LEARNING

Data science is also more than “machine learning,” which is about how systems learn from
data. Systems may be trained to use data to make decisions, and training is a continuous
process, where the system updates its learning and (hopefully) improves its decision-making
ability with more data. A spam filter is a good example of machine learning. As we feed it
more data it keeps changing its decision rules, using a Bayesian filter, thereby remaining
ahead of the spammers. It is this ability to adaptively learn that prevents spammers from
gaming the filter, as highlighted in Paul Graham’s interesting essay titled “A Plan for Spam”.
Credit card approvals are also based on neural networks, another popular machine learning
technique.

However, machine-learning techniques favour data over judgment, and good data science
requires a healthy mix of both. Judgment is needed to accurately contextualize the setting for
analysis and to construct effective models. A case in point is Vinny Bruzzese, known as the
“mad scientist of Hollywood” who uses machine learning to predict movie revenues. He
asserts that mere machine learning would be insufficient to generate accurate predictions. He
complements machine learning with judgment generated from interviews with screenwriters,
surveys, etc., "to hear and understand the creative vision, so our analysis can be
contextualized.”

Machine intelligence is re-emerging as the new incarnation of AI (a field that many feels has
not lived up to its promise). Machine learning promises and has delivered on many questions
of interest, and is also proving to be quite a game-changer, as we will see later on in this
chapter, and also as discussed in many preceding examples. What makes it so appealing?
Hilary Mason suggests four characteristics of machine intelligence that make it interesting, it
is usually based on a theoretical breakthrough and is therefore well-grounded in science. It
changes the existing economic paradigm. The result is commoditization (e.g. Hadoop), and it
makes available new data that leads to further data science.
1.3.1 Supervised and Unsupervised Learning

Figure 1.9

Systems may learn in two broad ways, through “supervised” and “unsupervised” learning. In
supervised learning, a system produces decisions (outputs) based on input data. Both spam
filters and automated credit card approval systems are examples of this type of learning. So is
linear discriminant analysis (LDA). The system is given a historical data sample of inputs
and known outputs, and it "learns" the relationship between the two using machine learning
techniques, of which there are several. Judgment is needed to decide which technique is most
appropriate for the task at hand.

Unsupervised learning is a process of reorganizing and enhancing the inputs to place the
structure on unlabelled data. A good example is cluster analysis, which takes a collection of
entities, each with several attributes, and partitions the entity space into sets or groups based
on the closeness of the attributes of all entities. It reorganizes the data, but it also enhances
the data by labelling the data with additional tags (in this case a cluster number/name). Factor
analysis is also an unsupervised learning technique. The origin of this terminology is unclear,
but it presumably arises from the fact that there is no clear objective function that is
maximized or minimized in unsupervised learning, so no “supervision” is required to reach
an optimal. However, this is not necessarily true in general, and we will see examples of
unsupervised learning (such as community detection in the social web), where the outcome
depends on measurable objective criteria.

Figure 1.10

1. 3.2 Predictions and Forecasts

Data science is about making predictions and forecasts. There is a difference between the
two. The statistician-economist Paul Saffo has suggested that predictions aim to identify one
outcome, whereas forecasts encompass a range of outcomes. To say that “it will rain
tomorrow” is to make a prediction, but to say that “the chance of rain is 40%” (implying that
the chance of no rain is 60%) is to make a forecast, as it lays out the range of possible
outcomes with probabilities. We make weather forecasts, not predictions. Predictions are
statements of great certainty, whereas forecasts exemplify the range of uncertainty. In the
context of these definitions, the term predictive analytics is a misnomer for its goal is to
make forecasts, not mere predictions.
1.3.3 Innovation and Experimentation

Data science is about new ideas and approaches. It merges new concepts with fresh
algorithms. Take for example the A/B test, which is nothing but the online implementation of
a real-time focus group. Different subsets of users are exposed to A and B stimuli
respectively, and responses are measured and analyzed. It is widely used for website design.
This approach has been in place for more than a decade, and in 2011 Google ran more than
7,000 A/B tests. Facebook, Amazon, Netflix, and several other firms use A/B testing widely.
The social web has become a teeming ecosystem for social science experiments. The
potential to learn about human behaviour using innovative methods is much greater now than
ever before.

1.3.4 Big Errors

Figure 1.11

A good data scientist will take care to not overreach in concluding big data. Because there
are so many variables available, and plentiful observations, correlations are often statistically
significant but devoid of basis. In the immortal words of the bard, empirical results from big
data maybe - "A tale told by an idiot, full of sound and fury, signifying nothing." One must
be careful not to read too much into the data. More data does not guarantee less noise, and
signal extraction may be no easier than with fewer data.

Adding more columns (variables in the cross-section) to the data set, but not more rows (time
dimension) is also fraught with danger. As the number of variables increases, more
characteristics are likely to be related statistically. Overfitting models in-sample is much
more likely with big data, leading to poor performance out-of-sample.

Researchers have also to be careful to explore the data fully, and not terminate their research
the moment a viable result, especially one that the researcher is looking for, is attained. With
big data, the chances of stopping at a suboptimal, or worse, intuitively appealing albeit the
wrong result become very high. It is like asking a question to a class of students. In a very
large college class, the chance that someone will provide a plausible yet off-base answer
quickly is very high, which often short circuits the opportunity for others in the class to think
more deeply about the question and provide a much better answer.

Figure 1.12
Nassim Taleb describes these issues elegantly - “I am not saying there is no information in
big data. There is plenty of information. The problem – the central issue – is that the needle
comes in an increasingly larger haystack.” The fact is, one is not always looking for needles
or Taleb’s black swans, and there are plenty of normal phenomena about which robust
forecasts are made possible by the presence of big data.

1.3.5 Privacy

Figure 1.13

The emergence of big data coincides with a gigantic erosion of privacy. Humankind has
always been torn between the need for social interaction and the urge for solitude and
privacy. One trades off against the other. Technology has simply sharpened the divide and
made the slope of this trade-off steeper. It has provided tools of social interaction that steal
privacy much faster than in the days before the social web.

Rumors and gossip are now the old worlds. They required bilateral transmission. The social
web provides multilateral revelation, where privacy no longer capitulates a battle at a time,
but the entire war is lost at one go. And data science is the tool that enables firms,
governments, individuals, benefactors and predators, et al, en masse, to feed on privacy’s
carcass.

The loss of privacy is manifested in the practice of human profiling through data science. Our
web presence increases entropically as we move more of our life’s interactions to the web, be
they financial, emotional, organizational, or merely social. And as we live more and more of
our lives in this new social media, data mining and analytics enable companies to construct
very accurate profiles of who we are, often better than what we might do ourselves. We are
moving from "know thyself" to knowing everything about almost everyone.

If you have a Facebook or Twitter presence, rest assured you have been profiled. For
instance, let’s say you tweeted that you were taking your dog for a walk. Profiling software
now increments your profile with an additional tag - pet owner. An hour later you tweet that
you are returning home to cook dinner for your kids. Your profile is now further tagged as a
parent. As you can imagine, even a small Twitter presence ends up being dramatically
revealing about who you are. Information that you provide on Facebook and Twitter, your
credit card spending pattern, and your blog, allows the creation of a profile that is accurate
and comprehensive, and probably more objective than the subjective and biased opinion you
have of yourself. A machine knows better. And you are the product!

Humankind leaves an incredible trail of “digital exhaust” comprising phone calls, emails,
tweets, GPS information, etc., that companies use for profiling. It is said that 1/3 of people
have a digital identity before being born, initiated with the first sonogram from a routine
hospital visit by an expectant mother. The half-life of non-digital identity or the average age
of digital birth is six months, and within two years 92% of the US population has a digital
identity.

Those of us who claim to be safe from revealing our privacy by avoiding all forms of social
media are simply profiled as agents with a "low digital presence." It might be interesting to
ask such people whether they would like to reside in a profile bucket that is more likely to
attract government interest than a profile bucket with a more average digital presence. In this
age of profiling, the best way to remain inconspicuous is not to hide, but to remain as average
as possible, to be mostly lost within a large herd.

Privacy is intricately and intrinsically connected to security and efficiency. The increase in
transacting on the web, and the confluence of profiling, has led to massive identity theft. Just
as in the old days, when a thief picked your lock and entered your home, most of your
possessions were at risk. It is the same with electronic break-ins, except that there are many
more doors to break in from and so many more windows through which an intruder can
unearth revealing information. And unlike a thief who breaks into your home, a hacker can
reside in your electronic abode for quite some time without being detected, an invisible
parasite slowly causing damage. While you are blind, you are being robbed blind. And unlike
stealing your worldly possessions, stealing your very persona and identity is the cruellest cut
of them all.

Based on buyers’ profiles, the seller will offer each buyer the price he is willing to pay on the
demand curve. Profiling helps sellers capture consumer’s surplus and eat into the region of
missed sales. Targeting brings benefits to sellers and they actively pursue it. The benefits
outweigh the costs of profiling, and the practice is widespread as a result. Profiling also fine-
tunes price segmentation, and rather than break buyers into a few segments, usually two,
each profile becomes a separate segment, and the granularity of price segmentation is
modulated by the number of profiling groups the seller chooses to model.

Of course, there is an insidious aspect to profiling, which has existed for quite some time,
such as targeting conducted by tax authorities. Also, we will not take kindly to insurance
companies profiling us any more than they already do. Profiling is also undertaken to snare
terrorists. However, there is a danger in excessive profiling. A very specific profile for a
terrorist makes it easier for their ilk to game detection as follows. Send several possible
suicide bombers through airport security and see who has repeatedly pulled aside for
screening and who is not. Repeating this exercise enables a terrorist cell to learn which
candidates do not fall into the profile. They may then use them for the execution of a terror
act, as they are unlikely to be picked up for the special screening. The antidote?
Randomization of people picked for a special screening in searches at airports, which makes
it hard for a terrorist to always assume no likelihood of detection through screening.

Automated invasions of privacy naturally lead to human responses, not always rational or
predictable. This is articulated in Campbell’s Law: “The more any quantitative social
indicator (or even some qualitative indicator) is used for social decision-making, the more
subject it will be to corruption pressures and the more apt it will be to distort and corrupt the
social processes it is intended to monitor." We are in for an interesting period of interaction
between man and machine, where the battle for privacy will take centre stage.

1.4 THEORIES, MODELS, INTUITION, CAUSALITY, PREDICTION,

CORRELATION

My view of data science is one where theories are implemented using data, some of its big
data. This is embodied in an inference stack comprising (in sequence): theories, models,
intuition, causality, prediction, and correlation. The first three constructs in this chain are
from Emanuel Derman’s wonderful book on the pitfalls of models.
Theories are statements of how the world should be or is and are derived from axioms that
are assumptions about the world, or precedent theories. Models are implementations of
theory, and in data science are often algorithms based on theories that run on data. The
results of running a model lead to intuition, i.e., a deeper understanding of the world based on
theory, model, and data. Whereas there are schools of thought that suggest data is all we
need, and theory is obsolete, this author disagrees. Still, the unreasonable proven
effectiveness of big data cannot be denied. Chris Anderson argues in his Wired magazine
article thus:

Sensors everywhere. Infinite storage. Clouds of processors. Our ability to capture,

warehouse, and understand massive amounts of data is changing science, medicine, business,
and technology. As our collection of facts and figures grows, so will the opportunity to find
answers to fundamental questions. Because in the era of big data, more isn’t just more. More
is different.

In contrast, the academic Thomas Davenport writes in his foreword to Seigel (2013) that
models are key, and should not be increasingly eschewed with increasing data:

But the point of predictive analytics is not the relative size or unruliness of your data, but
what you do with it. I have found that “big data often means small math,” and many big data
practitioners are content just to use their data to create some appealing visual analytics.
That’s not nearly as valuable as creating a predictive model.

Once we have established intuition for the results of a model, it remains to be seen whether
the relationships we observe are causal, predictive, or merely correlational. Theory may be
causal and tested as such. Granger (1969) causality is often stated in mathematical form

for two stationary time series of data as follows. X is said to Granger-cause Y, if in the
following equation system,

Y(t) = a1 +b1Y(t−1) +c1X(t−1)+e1 X(t) = a2 +b2Y(t−1)+c2X(t−1)+e2,

the coefficient c1 is significant and b2 is not significant. Hence, X
causes Y, but not vice versa.

Causality is a hard property to establish, even with a theoretical foundation, as the causal
effect has to be well-entrenched in the data.

We have to be careful to impose judgment as much as possible since statistical relationships

may not always be what they seem. A variable may satisfy the Granger causality regressions
above but may not be causal. For example, we earlier encountered the flu example in Google
Trends. If we denote searches for flu as X, and the outbreak of flu as Y, we may see a
Granger cause relation between flu and searches for it. This does not mean that searching for
flu causes flu, yet searches are predictive of flu. This is the essential difference between
prediction and causality.

And then there is a correlation, at the end of the data science inference chain.
Contemporaneous movement between two variables is quantified using correlation. In many
cases, we uncover correlation, but no prediction or causality. A correlation has great value to
firms attempting to tease out beneficial information from big data. And even though it is a
linear relationship between variables, it lays the groundwork for uncovering nonlinear
relationships, which are becoming easier to detect with more data. The surprising parable
about Walmart finding that purchases of beer and diapers seem to be highly correlated
resulted in these two somewhat oddly-paired items being displayed on the same aisle in
supermarkets. Unearthing correlations of sales items across the population quickly lead to
different business models aimed at exploiting these correlations, such as my book buying
inducement from Barnes & Noble, where my “fly and buy” predilection is easily exploited.
Correlation is often all we need, eschewing human cravings for causality. As Mayer-
Schönberger and Cukier (2013) so aptly put it, we are satisfied “... not knowing why but only
what.”

In the data scientist mode of thought, relationships are multifaceted correlations amongst
people. Facebook, Twitter, and many other platforms are satisfying human relationships
using graph theory, exploiting the social web in an attempt to understand better how people
relate to each other, intending to profit from it. We use correlations on networks to mine the
social graph, understanding better how different social structures may be exploited. We
answer questions such as where to seed a new marketing campaign, which members of a
network are more important than the others, how quickly will information spread on the
network, i.e., how strong is the “network effect”?

Data science is about the quantization and understanding of human behaviour, the holy grail
of social science. In the following chapters, we will explore a wide range of theories,
techniques, data, and applications of a multi-faceted paradigm. We will also review the new
technologies developed for big data and data science, such as distributed computing using the
Dean and Ghemawat (2004) MapReduce paradigm developed at Google, and implemented as
the open-source project Hadoop at Yahoo!. When data gets super-sized, it is better to move
algorithms to the data than the other way around. Just as big data has inverted database
paradigms, so is big data changing the nature of inference in the study of human behaviour.
Ultimately, data science is a way of thinking, for social scientists, using computer science.

1.5 THE VERY BEGINNING: GOT MATH?

Business analysis requires the use of numerous mathematical methods, from arithmetic and
calculus to mathematics and econometrics, with implementations in diverse programming
languages and applications. It calls for strategic skills as well as strong reasoning and the
capacity to pose insightful questions and to deploy data to address questions.

The presence of the web as the main forum for business and marketing has spawned massive
volumes of data, pushing companies to strive to leverage vast knowledge stores to improve
their competitive advantage. As a result, corporations in Silicon Valley (and elsewhere)
recruit a new form of employee known as "data scientists" whose job is to evaluate "big data"
using methods such as those you can learn in this course.

This chapter would discuss some of the geometry, statistics, linear algebra, and equations that
you might not have seen for several years. It's more enjoyable than it seems. We'll even learn
how to use certain mathematical packages along the way. We're going to review some of the
typical equations and analyses that you're going to find in previous classes that you may have
taken. You'll refresh some old ideas, discover new ones, and grow technically proficient at
trading software.

1.6 EXPONENTIALS, LOGARITHMS, AND COMPOUNDING

It is necessary to start with the basic mathematical constant, "e = 2.718281828..." which is
also the function "exp (·)." This function is also written as ex, where x may be a real or
complex variable. It occurs in many fields, particularly in finance, where it is used for
continuous compounding and discounting of capital at a certain interest rate r over a certain
period t.

Provided y = ex, a defined shift in x results in the same constant change.

Percent shift in y. This is since ln(y) = x, where ln (·) is the normal logarithm equation, is the
reciprocal function of the exponential function. Recall that dy dx is the first derivative of this
function.

That is the function itself.

Constant e is defined as the limit for a particular function:

Exponential compounding is the limit of successively shorter intervals over discrete
compounding. It is written as:

n→∞ n

This is the forward value of one dollar. Present value is just the
reverse. Therefore, the price today of a dollar received t years
from today is P = e−rt. The yield of a bond is:

r = −1t ln(P)

In bond mathematics, the negative of the percentage price sensitivity of a bond to changes in
interest rates is known as

“Duration”: dP 1 �−rt 1 �1

−drP=− −te P =tPP=t.

The derivative dP is the price sensitivity of the bond to changes in interest rates and is
negative.

Further dividing this by P gives the percentage price sensitivity. The minus sign in front of
the definition of duration is applied to convert the negative number to a positive one.

The “Convexity” of a bond is its percentage price sensitivity relative to the second derivative,
i.e.,

d2P1 =t2P1 =t2. dr2 P P

Because the second derivative is positive, we know that the bond pricing function is convex.

1.7 NORMAL DISTRIBUTION

This distribution is the workhorse of many models in the social sciences and is assumed to
generate much of the data that comprises the big data universe. Interestingly, most
phenomena (variables) in the real world are not normally distributed. They tend to be “power
law” distributed, i.e., many observations of low value, and very few of high value. The
probability distribution declines from left to right and does not have the characteristic hump
shape of the normal distribution. An example of data that is distributed thus is income
distribution (many people with low income, very few with high income). Other examples are
word frequencies in languages, population sizes of cities, number of connections of people in
a social network, etc.

Still, we do need to learn about the normal distribution because it is important in statistics,
and the central limit theorem does govern much of the data we look at. Examples of
approximately normally distributed data are stock returns and human heights.

If x ∼ N(μ, σ2), that is, x is normally distributed with mean μ and variance σ2, then the
probability “density” function for x is:

The cumulative probability is given by the “distribution” function

and

f(u)du F(x) = 1 − F(−x) because the normal distribution is

symmetric. We often also use the notation N(·) or Φ(·) instead of
F(·).

The “standard normal” distribution is: x ∼ N (0, 1). For the

standard normal distribution: F (0) = 12 . The normal distribution
has continuous support, i.e., a range of values of x that goes
continuously from −∞ to +∞.

1.8 VECTOR ALGEBRA

We will be using linear algebra in many of the models that we explore in this book. Linear
algebra requires the manipulation of vectors and matrices. We will also use vector calculus.
Vector algebra and calculus are very powerful methods for tackling problems that involve
solutions in spaces of several variables, i.e., in high dimensions. The parsimony of using
vector notation will become apparent as we proceed. This introduction is very light and
meant for a reader who is mostly uninitiated in linear algebra.

This is a random vector, because each return Ri, i = 1, 2, . . ., N comes from its distribution,
and the returns of all these stocks are correlated. This random vector's probability is
represented as a joint or multivariate probability distribution. Note that we use a bold font to
denote the vector R.

1.9 OPEN SOURCE: MODELLING IN R

In this chapter, we develop some expertise in using the R statistical package. There are many
tutorials available now on the web. See the manuals on the R website www.r-project.org.
There is also a great book titled “The Art of R Programming” by Norman Matloff. Another
useful book is “Machine Learning for Hackers” by Drew Conway and John Myles White.

1.9.1 System Commands

If you want to directly access the system you can issue system commands as follows:

system (“<command>" )

For example

system("ls −lt | grep Das")

will list all directory entries that contain my last name in reverse chronological order. Here
the Unix command is being used, so this will not work on a Windows machine, but it will
certainly work on a Mac or Linux box.

However, you are hardly going to be issuing commands at the system level, so you are
unlikely to use the system command very much.

1.9.2 Loading Data

To get started, we need to grab some data. Go to Yahoo! Finance and download some
historical data in an Excel spreadsheet, re-sort it into chronological order, then save it as a
CSV file. Read the file into R as follows:

The last command reverses the sequence of the data if required. We can download stock data
using the quantmod package. Note: to install a package you can use the drop-down menus on
Windows and Mac operating systems, and use a package installer on Linux. Or issue the
following command:

install. packages("quantmod") Now we move on to using this package.

[ 1] "YHOO" "AAPL" "CSCO" "IBM"

> yhoo = YHOO[ ’2007−01−03::2015−01−07 ’ ]

> aapl = AAPL[ ’2007−01−03::2015−01−07 ’ ]

> csco = CSCO[ ’2007−01−03::2015−01−07 ’ ]

> ibm = IBM[ ’2007−01−03::2015−01−07 ’ ]

Or we can also directly create columns of stock data as follows.

Figure 1.14

>yhoo = as.matrix(YHOO[,6])

>aapl = as.matrix(AAPL[ ,6])

> csco = as.matrix(CSCO[,6])

> ibm = as.matrix(IBM[,6])

We now go ahead and concatenate columns of data into one stock data set.

Now, compute daily returns. This time, we do log returns in continuous- time. The mean
returns are:

> n = length(stkdata[,1])

>n
> rets = log(stkdata[2:n,]/stkdata[1:(n−1),])

> colMeans( rets )

YHOO. Adjusted AAPL. Adjusted CSCO. Adjusted

Notice the print command that allows you to choose the number of significant digits.

For more flexibility and better handling of data files in various formats, you may also refer to
the reader package. It has many useful functions.

1.9.3 Root Solving

Finding roots of nonlinear equations is often required, and R has several packages for this
purpose. Here we examine a few examples.

Suppose we are given the function

(x2 + y2 − 1)3 − x2y3 = 0

and for various values of y, we wish to solve for the values of x.

Figure 1.15

1.9.4 Regression

In multivariate linear regression, we have Y=X·β+e

where Y ∈ Rt×1, X ∈ Rt×n, and β ∈ Rn×1, and the regression
solution is simply equal to β = (X′X)−1(X′Y) ∈ Rn×1.

To get this result we minimize the sum of squared errors.

mine′e = β

(Y−X·β)′(Y−X·β)

= Y′(Y−X·β)−(Xβ)′ ·(Y−X·β)

= Y′Y − Y′Xβ − (β′X′)Y + β′X′Xβ

= Y′Y − Y′Xβ − Y′Xβ + β′X′Xβ

= Y′Y − 2Y′Xβ + β′X′Xβ

Note that this expression is a scalar. Differentiating w.r.t. β′ gives the following f.o.c:

−2X′Y+2X′Xβ = 0

=⇒ ′ −1 ′

β = (XX) (XY)
Figure 1.15

There is another useful expression for each βi = Cov(Xi, Y). You

open source: modelling in r 77

Var(Xi ) should compute this and check that each coefficient in the regression is

indeed equal to the βi from this calculation.

Example: Let’s do a regression and see whether AAPL, CSCO, and IBM can explain the
returns of YHOO. This uses the data we had downloaded earlier.

> dim(rets) [1] 2017 4

> Y = as.matrix(rets[,1])

> X = as.matrix(rets[,2:4]) > n = length(Y)

> X = cbind(matrix(1,n,1),X)

> b = solve(t(X) %% X) %% (t(X) %*% Y) >b

[ ,1]

3.139183e−06 AAPL. Adjusted 1.854781e−01

For visuals, do see the abline() function as well.

Here is a simple regression run on some data from the 2005-06 NCAA

basketball season for the March madness stats. The data is stored in a space-delimited file
called ncaa.txt. We use the metric of performance to be the number of games played, with
more successful teams playing more playoff games, and then try to see what variables
explain it best. We apply a simple linear regression that uses the R command lm, which
stands for “linear model.”

We know that the command lm returns an "object" with the name res. This object contains
various details about the regression result, and can then be called by other functions that will
format and present various versions of the result. For example, using the following command
gives a nicely formatted version of the regression output, and you should try to use it when
presenting regression results.

An alternative approach using data frames is:

(The output is not shown here to not repeat what we saw in the previous regression.) Data
frames are also objects. Here, objects are used in the same way as the term is used in object-
oriented programming (OOP), and similarly, R supports OOP as well.

Direct regression implementing the matrix form is as follows (we had derived this earlier):

> wuns = matrix (1 ,64 ,1)

> ncaa_data_frame = data.frame(y=as.matrix(ncaa[3]) , x=as . matrix ( ncaa [ 4 : 1 4 ] ) )

> fm = lm(y~x,data=ncaa_data_frame) > summary(fm)

> z > b >b

PTS REB AST TO A.T STL BLK PF FG

= cbind(wuns,x)

= inv(t(z) %% z) %% (t(z) %*% y)

GMS −10.194803524 −0.010441929 0.105047705 −0.060798192
−0.034544881 1.325402061 0.181014759 0.007184622 −0.031705212
13.823189660

FT 2.694716234 X3P 2.526830872

Note that this is the same result as we had before, but it gave us a chance to look at some of
the commands needed to work with matrices in R.

1.9.5 Heteroscedasticity

Simple linear regression assumes that the standard error of the residuals is the same for all
observations. Many regressions suffer from the failure of this condition. The word for this is
“heteroskedastic” errors. “Hetero” means different, and “skedastic” means dependent on
type.

We can first test for the presence of heteroscedasticity using a standard Breusch-Pagan test
available in R. This resides in the latest package which is loaded in before running the test.

studentized Breusch−Pagan test data: result

BP = 15.5378 , df = 11 , p−value = 0.1592

We can see that there is very little evidence of heteroscedasticity in the standard errors as the
p-value is not small. However, let's -+go ahead and correct the t-statistics for
heteroscedasticity as follows, using the hccm function. The “hccm” stands for
heteroscedasticity corrected covariance matrix.

> wuns = matrix (1 ,64 ,1)

> z = cbind(wuns,x)

>b = solve(t(z) %% z) %% (t(z) %*% y)

>result = lm(y~x) library ( car )

>vb = hccm( result )

> stdb = sqrt(diag(vb)) > tstats = b/stdb

> tstats

GMS −2.68006069 PTS −0.38212818

Here we used the hccm function to generate the new covariance matrix vb of the coefficients,
and then we obtained the standard errors as the square root of the diagonal of the covariance
matrix. Armed with these revised standard errors, we then recomputed the t-statistics by
dividing the coefficients by the new standard errors. Compare these to the t-statistics in the
original model. It is apparent that when corrected for heteroscedasticity, the t-statistics in the
regression are lower, and also render some of the previously significant coefficients
insignificant.

Resolving heteroscedasticity

Steps to resolve heteroscedasticity are as follows:

● Regression with robust standard errors – incorrect standard issues can be resolved
using regression analysis to perform ordinary least squares that no longer produce the
best linear unbiased estimators.

● Generalized least squares with an unknown form of variance – the generalized least
squares estimators determine the best-unbiased estimator.

1.9.6 Auto-regressive models

When data is autocorrelated, i.e., has a dependence on time, not accounting for it results in
unnecessarily high statistical significance. Intuitively, this is because observations are treated
as independent when they are correlated in time, and therefore, the true number of
observations is effectively less.

Inefficient markets, the correlation of returns from one period to the next should be close to
zero. We use the returns stored in the variable rets (based on Google stock) from much
earlier in this chapter.

This is for immediately consecutive periods, known as first-order auto-correlation. We may

examine this across many staggered periods. For this R has some neat library functions in the
package car. There is no evidence of auto-correlation when the DW statistic is close to 2. If
the DW-statistic is greater than 2 it indicates negative autocorrelation, and if it is less than 2,
it indicates positive autocorrelation.

In the data there only seems to be statistical significance at the eighth lag. We may regress
leading values on lags to see if the coefficient is significant.

Resolving Auto Correlation

The autocorrelation can be resolved in various ways and they are listed as follows:
● By including dummy variable in the data

● Estimated generalized least squares

● By including a linear trend term if the residuals show a consistent increase or

decrease pattern.

1.9.7 Vector Auto-Regression

Also known as VAR (not the same thing as Value-at-Risk, denoted VaR). VAR is useful for
estimating systems where there are simultaneous regression equations, and the variables
influence each other. So in a VAR, each variable in a system is assumed to depend on lagged
values of itself and the other variables. The number of lags may be chosen by the
econometrician based on what is the expected decay in time-dependence of the variables in
the VAR.

In the following example, we examine the inter-relatedness of returns of the following three
tickets: SUNW, MSFT, IBM. For vector auto-regressions (VARs), we run the following R
commands:

The “order” of the VAR is how many lags are significant. In this example, the order is 1.
Hence, when the “ar" command is given, it shows the coefficients on the lagged values of the
three values to just one lag. For example, for SUNW, the lagged coefficients are -0.0098,
0.0222, and 0.0021, respectively for SUNW, MSFT, IBM. The Akaike Information Criterion
(AIC) tells us which lag is significant, and we see below that this is lag 1.

Interestingly, we see that each of the tickers has a negative relation to its lagged value, but a
positive correlation with the lagged values of the other two stocks. Hence, there is positive
cross autocorrelation amongst these tech stocks.

1.9.8 Prediction Trees

Prediction trees are the inevitable consequence of recursive data partitioning. They are also a
basic type of clustering at various stages. The normal cluster analysis results in a "flat"
partition, but the node prediction creates a multi-level tree cluster. The concept used here is
CART, which stands for the study of classification and regression trees. Yet prediction trees
are distinct from vanilla clustering in a significant aspect – there is a dependent variable, i.e.
a group or set of values (e.g. score) that one is trying to forecast.
Suppose we want to estimate the credit score of a person using age, salary, and education as
explanatory variables. Suppose the wage is the better predictive predictor of the three. Then,
at the peak of the tree, there would be income as a branching feature, i.e. if income is less
than any threshold, then we'll move down the left branch of the tree, otherwise, we'll go
down the right. It could be at the second stage where we use education to create the next
bifurcation, and then at the third step where we use the age. A variable can also be used
continuously at more than one stage.

1.10 SUMMARY

Data science uses complex machine learning algorithms to build predictive models. In
practice, data science is already helping the airline industry predict travel disruptions. Data
science is an essential part of any industry given the massive amounts of data that are
produced. Data science is one of the most discussed issues in the industry these days. Its
prevalence has increased over the years, and businesses have begun to adopt data science
strategies to improve their market and enhance customer loyalty. The idea analysis is the first
phase of the data science program. The purpose of this phase is to clarify the issue by doing a
study of the business model. Because raw data may not be available, data processing is the
most critical part of the data science lifecycle. The data scientist must first review the data
and find any holes or data that may not add meaning. Using various analytical tools and
techniques, data scientists can manipulate the data with the goal of 'discovering' useful
information. The data used for analysis can be from multiple sources and present in various
formats. Machine learning is where computers can use an algorithm to develop it and "learn"
over time as they communicate with more data. With machine learning, you can feed a
computer with terabytes and petabytes of data, so that they learn the difference and write
their algorithms based on the underlying human-driven programming to produce the desired
outcome.

1.11 SELF-ASSESSMENT QUESTIONS

A. Descriptive Type Questions

1. What is linear regression? What do the terms p-value, coefficient, and r-squared value
mean? Write the significance of each of these components.

2. Discuss how k-NN is different from k-means clustering.

3. What is a normal distribution?

4. Elaborate on what is exploratory data analysis.

5. What is the principle of building a random forest?

6. Explain the SVM machine learning algorithm in detail.

7. How do you select k for k-means?

8. Define what is a confusion matrix.

9. What is entropy and information gain in the decision tree algorithm?

10. Explain the steps in making a decision tree.

B. Multiple Choice Questions

1. Study the image given below. Which graphs are being referred to here?

a. Exploratory

b. Inferential

c. Causal

d. None of the above-mentioned options

2. Choose the model that sums the importance over each boosting iteration.

a. Boosted trees

b. Bagged trees

c. Partial least squares

d. None of the above-mentioned options

3. Which of the following arguments is used to set importance values?

a. Scale

b. Set

c. Value

d. Subset

4. Choose the correct statement.

a. All cumulative distribution functions F are decreasing and right-continuous

b. Some cumulative distribution functions F are non-decreasing and right-continuous

c. All cumulative distribution functions F are increasing and left-continuous

d. None of the above-mentioned options

5. Which testing is concerned with making decisions using data?

a. Probability

b. Hypothesis

c. Causal

d. None of the above-mentioned options

6. Exploratory Data Analysis relies on _____.

a. Visual techniques

b. Assumptions

c. Fixed models

d. Testing for statistical significance

7. Data that summarizes all observations in a category is called:

a. Frequency

b. Summarized

c. Raw

d. None of the above-mentioned options

8. ______ data is put into a formula to produce commonly accepted results.

a. Raw

b. Processed

c. Synchronized

d. Filtered

9. When asking a data analysis question which approach should be followed?

a. Data cleansing

b. Data integration

c. Data replication

d. Data duplication

10. Which analysis is modelled by a deterministic set of equations?

a. MCV

b. MCB

c. MARS

d. MCRS

Answers:

1-a, 2-a, 3-a, 4-d, 5-b,6-c,7-b,8-b,9-a,10-c

1.12 SUGGESTED READINGS

References book

● Data Science, Classification, and Related Methods. Studies in Classification, Data

Analysis, and Knowledge Organization. Springer Japan.

● Tony Hey; Stewart Tansley; Kristin Michele Tolle. The Fourth Paradigm: Data-
intensive Scientific Discovery. Microsoft Research.
● Bell, G.; Hey, T.; Szalay, A. "COMPUTER SCIENCE: Beyond the Data Deluge".
Science.

Website

● towardsdatascience.com

DBMS Material 2023
No ratings yet
DBMS Material 2023
50 pages
(Ebook PDF) Introductory Econometrics: A Modern Approach 6th Edition Download
100% (2)
(Ebook PDF) Introductory Econometrics: A Modern Approach 6th Edition Download
55 pages
Management Information Systems Unit - 3 Notes-1
No ratings yet
Management Information Systems Unit - 3 Notes-1
13 pages
CCW331 BUSINESS ANALYTICS-notes
No ratings yet
CCW331 BUSINESS ANALYTICS-notes
35 pages
BCSE 0105 - Machine Learning - Module 1 - Complete - NC
No ratings yet
BCSE 0105 - Machine Learning - Module 1 - Complete - NC
200 pages
A. Le Bot - Foundation of Statistical Energy Analysis in Vibroacoustics-Oxford University Press (2015)
100% (1)
A. Le Bot - Foundation of Statistical Energy Analysis in Vibroacoustics-Oxford University Press (2015)
333 pages
(Ebook PDF) CFA Program Curriculum 2019 Level II Volumes 1-6 Box Set PDF Download
No ratings yet
(Ebook PDF) CFA Program Curriculum 2019 Level II Volumes 1-6 Box Set PDF Download
57 pages
Box-Jenkins Methodology Forecasting Basics
No ratings yet
Box-Jenkins Methodology Forecasting Basics
11 pages
Econometrics - Exercise Set 4 (Solution)
No ratings yet
Econometrics - Exercise Set 4 (Solution)
16 pages
Unit II-Database Design, Archiitecture - Model
No ratings yet
Unit II-Database Design, Archiitecture - Model
23 pages
UNIT I Module 1 - Fundamentals of DBMS Concepts
No ratings yet
UNIT I Module 1 - Fundamentals of DBMS Concepts
9 pages
MIS BBA 6th Sem UNIT 1-2-3
No ratings yet
MIS BBA 6th Sem UNIT 1-2-3
167 pages
Data Science and Ethical Issues
No ratings yet
Data Science and Ethical Issues
42 pages
MSC Datascience Unit1
No ratings yet
MSC Datascience Unit1
20 pages
Applied Stochastic Processes PDF
No ratings yet
Applied Stochastic Processes PDF
104 pages
BI UNIT-II Chp01 (Mathematical Models For Decision Making)
No ratings yet
BI UNIT-II Chp01 (Mathematical Models For Decision Making)
9 pages
Business Analytics Local Author Book 1
No ratings yet
Business Analytics Local Author Book 1
233 pages
An Introduction To Digital Communications
No ratings yet
An Introduction To Digital Communications
70 pages
4-Fundamentals of Database Management
No ratings yet
4-Fundamentals of Database Management
99 pages
Bi Unit1
No ratings yet
Bi Unit1
93 pages
Unit IV - Database Normalization
No ratings yet
Unit IV - Database Normalization
31 pages
Does Inventory Management Improve Profitability? Empirical Evidence From Polish Manufacturing Industries
No ratings yet
Does Inventory Management Improve Profitability? Empirical Evidence From Polish Manufacturing Industries
24 pages
Chapter 7
No ratings yet
Chapter 7
39 pages
Ch3 - Characteristics of ARMA Model
No ratings yet
Ch3 - Characteristics of ARMA Model
34 pages
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 5 - Multivariate and Time Series Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 5 - Multivariate and Time Series Analysis
8 pages
R.D.B.M.S Practical Lab Record
No ratings yet
R.D.B.M.S Practical Lab Record
22 pages
CCW331 Business Analytics Material Unit I Type2
No ratings yet
CCW331 Business Analytics Material Unit I Type2
43 pages
Advanced Quantitative Methods
No ratings yet
Advanced Quantitative Methods
125 pages
Unit 1 Data Science Notes
No ratings yet
Unit 1 Data Science Notes
33 pages
DMW Lab Manual (1) EDIT
No ratings yet
DMW Lab Manual (1) EDIT
118 pages
Practical - 1 - Data Exploration and Data Preparation - DAL - Lab
100% (1)
Practical - 1 - Data Exploration and Data Preparation - DAL - Lab
8 pages
Adaptive Signal
No ratings yet
Adaptive Signal
9 pages
Model QP CCW 331 Nov Dec 2024
No ratings yet
Model QP CCW 331 Nov Dec 2024
3 pages
ECE
No ratings yet
ECE
14 pages
Lecture 1 - Intro To Module & Business Data Analytics
No ratings yet
Lecture 1 - Intro To Module & Business Data Analytics
36 pages
Ccw331 Business Analytics Syllabus
No ratings yet
Ccw331 Business Analytics Syllabus
2 pages
Business Intelligence
No ratings yet
Business Intelligence
60 pages
ARDL Model
No ratings yet
ARDL Model
5 pages
Ecotrix Ecotrix: B.A. Economics (Hons.) (University of Delhi) B.A. Economics (Hons.) (University of Delhi)
No ratings yet
Ecotrix Ecotrix: B.A. Economics (Hons.) (University of Delhi) B.A. Economics (Hons.) (University of Delhi)
18 pages
DSS - Ch01 Decision Support System Lecture Notes
No ratings yet
DSS - Ch01 Decision Support System Lecture Notes
7 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
38 pages
Financial Econometrics
No ratings yet
Financial Econometrics
5 pages
Stata Lecture2
No ratings yet
Stata Lecture2
134 pages
DBMS Notes
No ratings yet
DBMS Notes
141 pages
Stationary Stochastic Process
No ratings yet
Stationary Stochastic Process
47 pages
Industrial Engineering (Simple Linear)
No ratings yet
Industrial Engineering (Simple Linear)
82 pages
8.2 - SW Engineering - Effort Estimation - FP - COCOMO Model - New
No ratings yet
8.2 - SW Engineering - Effort Estimation - FP - COCOMO Model - New
21 pages
Data Science Module1
No ratings yet
Data Science Module1
20 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
12 pages
Answers Are Highlighted in Yellow Color: MCQ's Subject:Introductory Econometrics
100% (1)
Answers Are Highlighted in Yellow Color: MCQ's Subject:Introductory Econometrics
74 pages
Chapter 1 - Database Performance Tuning and Query Optimization
No ratings yet
Chapter 1 - Database Performance Tuning and Query Optimization
50 pages
Spatial Data Analysis - Raju
No ratings yet
Spatial Data Analysis - Raju
24 pages
RM2017 Midterm Questions
No ratings yet
RM2017 Midterm Questions
9 pages
Corporate Social Responsibility and Cost of Equity Capital The Moderating Role of Capital Structure
No ratings yet
Corporate Social Responsibility and Cost of Equity Capital The Moderating Role of Capital Structure
8 pages
Mis Notes
100% (1)
Mis Notes
55 pages
What Is DBMS - Application, Types, Example, Advantages
No ratings yet
What Is DBMS - Application, Types, Example, Advantages
7 pages
Mis Unit 1 Notes
75% (4)
Mis Unit 1 Notes
15 pages
Unit I - Introduction To DBMS
No ratings yet
Unit I - Introduction To DBMS
9 pages
Sharda dss10 PPT 04
No ratings yet
Sharda dss10 PPT 04
38 pages
01app - 2012 Board Characteristics and The Financial Performance of Nigerian Quoted Firms
No ratings yet
01app - 2012 Board Characteristics and The Financial Performance of Nigerian Quoted Firms
19 pages
Sample Question Paper - Information Technology 2024
No ratings yet
Sample Question Paper - Information Technology 2024
4 pages
The Asian Crisis and Exchange Rate Volatility Spillovers: Laura Citron, Alex Pick, Mathieu Vital
No ratings yet
The Asian Crisis and Exchange Rate Volatility Spillovers: Laura Citron, Alex Pick, Mathieu Vital
16 pages
Chapter 5 Data Resource Management
No ratings yet
Chapter 5 Data Resource Management
24 pages
Unit 3 Basics of SQL
No ratings yet
Unit 3 Basics of SQL
7 pages
Random-Number Generation: Discrete-Event System Simulation
No ratings yet
Random-Number Generation: Discrete-Event System Simulation
56 pages
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
No ratings yet
Data Warehousing & Data Mining Syllabus Subject Code:56055 L:4 T/P/D:0 Credits:4 Int. Marks:25 Ext. Marks:75 Total Marks:100
52 pages
A6515 BDA Question Bank
No ratings yet
A6515 BDA Question Bank
9 pages
Utkal Bca Syllabus
No ratings yet
Utkal Bca Syllabus
19 pages
LAB Set Questions Rdbms
No ratings yet
LAB Set Questions Rdbms
18 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Information Security Notes
No ratings yet
Information Security Notes
15 pages
Unit VIII - Query Processing and Security
No ratings yet
Unit VIII - Query Processing and Security
29 pages
Branching Instructions
No ratings yet
Branching Instructions
4 pages
Edureka CAS Brochure PDF
No ratings yet
Edureka CAS Brochure PDF
15 pages
CHAPTER 6 Future Directions in ERP
67% (6)
CHAPTER 6 Future Directions in ERP
4 pages
Statistical Digital Signal Processing and Modeling Exam November 2017
No ratings yet
Statistical Digital Signal Processing and Modeling Exam November 2017
3 pages
Informed Trade in Spot Foreign Exchange Markets
No ratings yet
Informed Trade in Spot Foreign Exchange Markets
23 pages
LINEAR SYSTEMS RESPONSE TO RANDOM INPUTSspdf
No ratings yet
LINEAR SYSTEMS RESPONSE TO RANDOM INPUTSspdf
9 pages
Financial Analytics 4
No ratings yet
Financial Analytics 4
9 pages
Case Study
No ratings yet
Case Study
7 pages
Detailed University Schema: Appendix
No ratings yet
Detailed University Schema: Appendix
2 pages
ASSIGNMENT 1 Questions BI
No ratings yet
ASSIGNMENT 1 Questions BI
1 page
Name: Kashfee Habib Id: 1620925 Subject: FIN542E (Investment and Portfolio Management) Submitted To: Dr. Prof. K M Zahidul Islam
No ratings yet
Name: Kashfee Habib Id: 1620925 Subject: FIN542E (Investment and Portfolio Management) Submitted To: Dr. Prof. K M Zahidul Islam
3 pages
Business Data Analytics Question Bank
No ratings yet
Business Data Analytics Question Bank
2 pages
Business View of Information Technology Applications
No ratings yet
Business View of Information Technology Applications
5 pages
Data Preprocessing
No ratings yet
Data Preprocessing
3 pages
IE405 System Dynamics
No ratings yet
IE405 System Dynamics
2 pages
Cns Lessonplan
No ratings yet
Cns Lessonplan
2 pages
Data Warehousing
No ratings yet
Data Warehousing
24 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet