0% found this document useful (0 votes)
13 views42 pages

Module 1

Uploaded by

retodo8981
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views42 pages

Module 1

Uploaded by

retodo8981
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Module 1: Basic data operations

Fundamentals of computations

Hardware Software
Fundamentals of computations
Computers generally operate in a 2 step cycle

1. Fetch the instruction


2. Execute the instruction

These two steps are repeated over and again.

So, computers are basically a combination of Information processing system


(CPU, memory) and environment (Input and output devices).

Instruction or set of instructions processed by the CPU is called a program.

Text form of these instruction is called a CODE.


Fundamentals of computations
CPU can only process the instruction that are in machine language.

Machine language, or machine code is comprised of binary digits (0 and 1).

CODE developed by the humans are in text from and not binary. So, to process
these instructions, text CODE must be converted into the binary form.

This can be achieved via

1. Compiling
2. Interpreting
Fundamentals of computations
Compilation: It is the process of converting text CODE into machine code. Here, entire
set of instruction are first converted into machine code. This can later be executed to
achieve the desired result.

Example: C/C++

Interpretation: Interpreters simultaneously convert and execute the instructions of the


CODE. This is done in a line-by-line manner.

Example: Python

Compiled CODEs are generally faster. However, they have platform dependence of the
generate machine code.

Interpreter CODEs are generally more flexible and dynamic, but are usually slower.
Fundamentals of computations
A programming language generally has following components

1. Data type: Object defined in the program will have some basic characteristic

Example: real or integer.

2. Data structure: Data can be represented in complex form

Example: Arrays.

3. Variables: Place to store the data or values

4. Assignments: It is a statement that sets and/or re-sets the value stored in the variable

Example: a=3+1.
Fundamentals of computations
5. Function: It is a command that accomplish a specific task. It generally takes an
argument and return a value

Example: print “hello world” ---> hello world

6. Control structure: Control structures are specific commands that dictate which
part of the program the computer will executed next.

Example: If command
Data visualization
Data

Data is/are a set of values of qualitative or quantitative variables.

Data in itself has a little meaning.

Information

Information is obtained via post-processing of data.

Information has a specific meaning to it.

Information is specific to the requirement.


Data visualization
Plotting

A plot is a graphical technique for representing a data


set, exhibiting the relationship between two or more
variables.

They are an important post processing method.

Data sets can graphically be represented in various


forms

Line chart
Pie chart
Log plot
Data visualization
Line chart

Also known as line plot or curve plot or


line graph.

This displays information as a series of


data points called 'markers'.

A line chart is often used to visualize a


trend in data.

A multi-line chart can be used to compare


the behavior of same variables in different
environments or conditions.
Data visualization
Bar Graphs

Bar graph or bar chart shows the comparison


between the discreet categories.

Pie Chart

It is a circular statistical graph, with each slice


representing the numerical proportion of the
category.
Data visualization
Histogram

Histogram represent the distribution of a


numerical data. X-axis and the Y-axis represent the
range and the frequencies respectively.

Scatter plots

A scatter plot uses dots to represent values for two


different numeric variables. If the points are color
codes can be used here to display one additional
variable.
Data visualization
Contour plot

A contour plot is a graphical technique for


representing a 3-dimensional surface by plotting
constant z slices, called contours, on a 2-dimensional
format.
3-D plots

It is the three dimensional representation of data. They


are also known as surface plot.

Here, individual data points show the relationship


between designated dependent variable (Y), and two
independent variables (X and Z).
Data interpretation
Data interpretation is the process of reviewing data through some predefined
processes which will help assign some meaning to the data and arrive at a relevant
conclusion.

It involves taking the result of data analysis, making inferences on the relations
studied, and using them to conclude.

Data analysis is the process of ordering, categorizing, manipulating, and


summarizing data to obtain answers to research questions. It is usually the first
step taken towards data interpretation.

There are two main methods of data interpretation


1. Qualitative methods
2. Quantitative methods
Data interpretation
Qualitative methods

Qualitative data, also known as categorical data, is a collection of information that is


divided into groups.

Qualitative data can take on numerical values, but those numbers have little
mathematical meaning.

Qualitative data can be classified into two categories:

1. Nominal data: This is a type of data used to name variables without providing any
numerical value.

2. Ordinal data: This is a data type with a set order or scale to it. However, this order
does not have a standard scale on which the difference in variables in each scale is
measured.
Data interpretation
Qualitative methods

Qualitative data is analyzed using mean or distribution etc.

Qualitative data analyzed graphically using a bar chart and pie chart.

In most of the cases, qualitative data has to be converted into numerical data before
the processing. However, these values do not exhibit quantitative characteristics.
Arithmetic operations can not be performed on them.
Data interpretation
Quantitative methods

Quantitative data, also known as numerical data, is always collected in number form.

Quantitative data can be classified into two categories,

1. Discrete Data: This is data type that represents countable items. It take on values that
can be grouped into a list.

2. Continuous Data: It represents a data type that can assume any value. Continuous
Data represents measurements and therefore their values can't be counted but they can
be measured.

Continuous data can further be divided into two catagories: (a) Interval data, and (b)
ratio data
Data interpretation
Quantitative methods

a. Interval data: This is a data type measured along a scale, in which each point is
placed at an equal distance from one another.

b. Ratio data: This data type has same properties as interval data, with an equal and
definitive ratio between each data and absolute “zero” being treated as a point of
origin. Thus, there can be no negative numerical value in ratio data.

Quantitative data can be analyzed using a variety of methods such as

Descriptive analysis methods: Mean, median mode etc.

Inferential statistical methods: Trend analysis


Data interpretation
Quantitative methods

Quantitative data can be utilized for estimation or enumeration.

One can perform mathematical operation on the quantitative data.

Quantitative data may be visualized in different ways depending on the type of data
being investigated, e.g., histogram, scattered plot, or line plot etc.

Note: One of the key factor that influence a data is bias. Presence of nay bias may
lead to falsification of results. Thus it important to avoid bias during the data
collection and interpretation.
Data fitting and (inter-extra)polation
Interpolation or extrapolation is a type of
estimation, a method of constructing
new data points within the range of a
discrete set of known data points.

To interpolate/extrapolate, we require a
function that is most suitable to
represent the given data set.

Function required for the interpolation is obtained via the curve fitting.

Curve fitting or data fitting is the process of constructing a curve, or


mathematical function, that has the best fit to a series of data points.
Data fitting and interpolation
Any data is fitted using the polynomial
equation

y= f(x)

f(x) = ax + b – Linear fitting or first


degree polynomial

f(x) = ax + bx2 + c – Second degree


polynomial

f(x) = ax + bx2 + cx3 + d – Third degree


polynomial
Curve fitting

Linear fitting

What values of ‘a’ and ‘b’ are most appropriate?


Curve fitting
Curve fitting Polynomial fitting

a0, a1, a2, a3, .........., aj ?


Curve fitting Polynomial fitting

a0, a1, a2, a3, .........., aj ?


Curve fitting
Curve fitting

na0+ Σxia1+Σxi2a2= Σyi


ΣXia0 + Σxi2a1 + Σxi3a2 = Σxiyi
ΣXi2a0 + Σxi3a1 + Σxi4a2 = Σxi2yix
Data plotting - Python
import matplotlib.pyplot as plt

x = [1,2,3,4]

y = [1,2,3,4]

plt.plot(x, y)

plt.show()
Data plotting - Python
import matplotlib.pyplot as plt

x = [1,2,3,4]

y = [1,2,3,4]

plt.plot(x, y)

plt.xlabel('x - axis')

plt.ylabel('y - axis')

plt.show()
Data plotting - Python
import matplotlib.pyplot as plt

x = [1,2,3,4]

y = [1,2,3,4]

plt.plot(x, y)

plt.xlabel('x – axis')

plt.ylabel('y – axis')

plt.title('My first graph')

plt.show()
Data plotting - Python
import matplotlib.pyplot as plt
x = [1,2,3,4]
y = [1,2,3,4]
plt.plot(x, y, color='green', linestyle='dashed', linewidth = 3,
marker='o', markerfacecolor='blue', markersize=12)
plt.xlabel('x – axis')
plt.ylabel('y – axis')
plt.title('My first graph')
plt.show()
Data plotting - Python
import matplotlib.pyplot as plt
x1 = [1,2,3,4]
y1 = [1,2,3,4]
plt.plot(x1, y1)
x2 = [1,2,3,4]
y2 = [4,3,2,1]
plt.plot(x2,y2)
plt.xlabel('x – axis', fontsize=16)
plt.ylabel('y – axis', fontsize=16)
plt.title('Two lines on same graph!')
plt.show()
Data plotting - Python
import matplotlib.pyplot as plt
plt.title('Two lines on same graph!')
x1 = [1,2,3,4]
plt.legend()
y1 = [1,2,3,4]
plt.show()
plt.plot(x1, y1, label = "line 1")

x2 = [1,2,3,4]

y2 = [4,3,2,1]
plt.plot(x2,y2, label = "line 2")

plt.xlabel('x – axis')
plt.ylabel('y – axis')
Data plotting - Python
import matplotlib.pyplot as plt

import numpy as np

y = [1,2,3,4]

abc = ['one', 'two', 'three', 'four']

plt.bar(np.arange(len(y)),y,tick_label = abc, width=0.4)

plt.show()
Data plotting - Python
import matplotlib.pyplot as plt
Import numpy as np
y1 = [1,2,3,4]
y2 = [1,2,3,4]
tick_label = ['one', 'two', 'three', 'four']
plt.bar(np.arange(len(y1))-0.2,y1,tick_label = tick_label, width=0.4)
plt.bar(np.arange(len(y2))+0.2,y2,tick_label = tick_label, width=0.4)
plt.title('my bar chart')
plt.legend(labels=['a','b'])
plt.show()
Data ploting - Python
import matplotlib.pyplot as plt
ages =[2,5,70,40,30,45,50,45,43,40,44,60,7,13,57,18,90,77,32,21,20,40]
range = (0, 100)
bins = 10
plt.hist(ages, bins, range)
plt.xlabel('age')
plt.ylabel('# of people')
plt.title('My histogram')
plt.show()
Data ploting - Python
import matplotlib.pyplot as plt
ages =[2,5,70,40,30,45,50,45,43,40,44,60,7,13,57,18,90,77,32,21,20,40]
range = (0, 100)
bins = 10
plt.hist(ages, bins, range, edgecolor='black')
plt.xlabel('age')
plt.ylabel('# of people')
plt.title('My histogram')
plt.show()
Data ploting - Python
import matplotlib.pyplot as plt

activities = ['eat', 'sleep', 'work', 'play']

slices = [3, 7, 8, 6]

colors = ['r', 'y', 'g', 'b']

plt.pie(slices, labels = activities, colors=colors)

plt.show()
Data ploting - Python
import matplotlib.pyplot as plt

activities = ['eat', 'sleep', 'work', 'play']

slices = [3, 7, 8, 6]

colors = ['r', 'y', 'g', 'b']

plt.pie(slices, labels = activities, colors=colors,autopct = '%1.2f%%')

plt.show()
Data ploting - Python
import matplotlib.pyplot as plt
Import numpy as np
x = [1,2,3,4,5,6,7,8,9,10]
y = [2,4,5,7,6,8,9,11,12,12]
plt.scatter(x, y)
plt.xlabel('x – axis')
plt.ylabel('y – axis')
plt.title('scatter plot')
plt.xticks(np.arange(min(x), max(x),1.0))
plt.show()
Data ploting - Python
import matplotlib.pyplot as plt

import numpy as np

feature_x = np.linspace(0, 3.0, 30)

feature_y = np.linspace(0, 3.0, 30)

[X, Y] = np.meshgrid(feature_x, feature_y)

Z = X ** 2 + Y ** 2

plt.contourf(X,Y,Z)

plt.show()
Data ploting - Python
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits import mplot3d
plt.axes(projection="3d")
x=[1,2,3,4,5]
y=[3,5,2,1,4]
z=[5,2,1,4,3]
plt.plot(x,y,z)
plt.show()

You might also like