Unit 3
Unit 3
Libraries
NumPy
NumPy stands for Numerical Python. It is a Python library used for working with arrays.
It also has functions for working in domain of linear algebra, Fourier transform, and
matrices. In Python we have lists that serve the purpose of arrays, but they are slow to
process. NumPy aims to provide an array object that is much faster than traditional
Python lists. The array object in NumPy is called ndarray, it provides a lot of supporting
functions that make working with ndarray very easy. Arrays are very frequently used in
data science, where speed and resources are very important. NumPy arrays are stored at
one continuous place in memory unlike lists, so processes can access and manipulate
them very efficiently. This behavior is called locality of reference in computer science.
This is the main reason why NumPy is faster than lists. Also it is optimized to work with
latest CPU architectures using the concept of vectorized processing.
import numpy as np
#creating and displaying array
data=[[1,2,6],[3,5,9]]
data=np.array(data)
print("Array Data")
print(data)
Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an
object describing the data type of the array.
The easiest way to create an array is to use the array function. This accepts any sequence
like object (including other arrays) and produces a new NumPy array containing the
passed data.
data=(2,6,9)
data=np.array(data)
print("Array Data")
print(data)
import numpy as np
#creating and displaying array
data=[[1,2,6],[3,5,9]]
data=np.array(data)
print("Array Data")
print(data)
NumPy arrays has many attributes. ndim is the attribute that represents the number of
dimensions (axes) of the ndarray.
#displaying dimension
print(data.ndim)
Unless explicitly specified, np.array tries to infer a good data type for the array that it
creates. The data type is stored in a special dtype object.
data=[2.4,3.9,-1.2]
data=np.array(data)
#print data type of array elemnts
print(data.dtype)
We can also specify data type of array elements explicitly while creating ndarrays.
data=np.array([1,3,5,8],dtype='int64')
print(data)
The numpy.arange() function is used to generate an array with evenly spaced values
within a specified interval. The function returns a one-dimensional array of type
numpy.ndarray.
array: Convert input data (list, tuple, array, or other sequence type) to an ndarray
either by inferring a dtype or explicitly specifying a dtype. Copies the input data
by default.
asarray: Convert input to ndarray, but do not copy if the input is already an
ndarray.
arrange: used to generate an array with evenly spaced values within a specified
interval.
ones, ones_like: Produce an array of all 1’s with the given shape and dtype.
ones_like takes another array and produces a ones array of the same shape and
dtype.
d=np.ones_like(data)
print(d)
zeros, zeros_like: Like ones and ones_like but producing arrays of 0’s instead.
d=np.zeros_like(data)
print(d)
empty, empty_like: Create new arrays by allocating new memory, but do not
populate with any values like ones and zeros.
eye, identity: Create a square N x N identity matrix (1’s on the diagonal and 0’s
elsewhere)
The data type or dtype is a special object containing the information the ndarray needs
to interpret a chunk of memory as a particular type of data.
data=np.array([1,3,5,8],dtype='int64')
print(data)
The numerical dtypes are named in the format: a type name, like float or int, followed by
a number indicating the number of bits per element. We can explicitly convert or cast an
array from one dtype to another using ndarray’s astype method.
data=np.array([1,3,5,8],dtype='int64')
print(data)
data=data.astype('float64')
print(data)
data=data.astype(np.int32)
print(data)
If we have an array of strings representing numbers, we can use astype to convert them
to numeric form.
data=np.array(['2.5','3.7','9.1'],dtype=np.string_)
print(data)
data=data.astype('float64')
print(data)
print(data.dtype)
If casting was failed for some reason (like a string that cannot be converted to float64), a
TypeError will be raised.
data=np.array(['2.5','3.7','9.1f'],dtype=np.string_)
print(data)
data=data.astype('float64') #Error
print(data)
print(data.dtype)
import numpy as np
a = np.array([[1., 2., 3.], [4., 5., 6.]])
print(a)
r=a*a
print("Element-wise multiplication of arrays:")
print(r)
r=a+a
print("Sum of arrays:")
print(r)
Arithmetic operations with scalars is propagated to the value to each element in the
NumPy array.
import numpy as np
a = np.array([[1., 2., 3.], [4., 5., 6.]])
print(a)
r=a/2
print("Half of array elements:")
print(r)
r=a**0.5
print("Square root of array elements:")
print(r)
import numpy as np
a = np.arange(10)
print("Array Elements:")
print(a)
print("Element at index 3")
print(a[3])
print("Element from index 3-6")
print(a[3:7])
If we assign a scalar value to a slice, as in a[4:6] = 11, the value is propagated (or
broadcasted henceforth) to the entire selection. An important first distinction from lists is
that array slices are views on the original array. This means that the data is not copied,
and any modifications to the view will be reflected in the source array as demonstrated
below.
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
aslice=a[3:7]
aslice[1]=15 #modification will be reflected in original array
print("Array Elements:")
print(a)
a = [1,2,3,4,5,6,7,8,9]
aslice=a[3:7]
aslice[1]=15 #modification will not be reflected in original list
print("List Elements:")
print(a)
As NumPy has been designed with large data use cases in mind, we could imagine
performance and memory problems if NumPy copies data instead of creating views. We
want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the
array; for example arr[5:8].copy().
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Array Element at index 2")
print(a[2])
print("Array Element at index 1,2")
print(a[1][2])
print(a[1,2])#Equivalent to a[1][2]
import numpy as np
a = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print("Array Elements")
print(a)
print("Array Element at index 1")
print(a[1])
print("Array Element at index 1,1")
print(a[1,1])
print("Array Element at index 1,1,2")
print(a[1,1,2])
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Array Elements:")
print(a)
print("First Two Rows")
print(a[0:2])# Or print(a[:2])
print("First Two Columns of array")
print(a[:,0:2])
print("2x2 slice in top-left corner")
print(a[0:2,0:2])
Like 1D array we can take array slices and update it which is reflected in the original
array.
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
aslice=a[0:2,0:2]
aslice[:,:]=0
print("Array Elements")
print(a)
We can create a mask that selects all elements of a that are greater than 20.
boolean_mask = a> 20
Above statement creates a Boolean mask that evaluates to True for elements that are
greater than 20, and False for elements that are less than or equal to 20. The resulting
mask is an array stored in the boolean_mask variable as below.
import numpy as np
a = np.array([12, 24, 16, 21, 32, 29, 7, 15])
boolean_mask = a > 20
print(boolean_mask)
print(a[boolean_mask])
a[boolean_mask]=0#sets all elements greater than 20 to zero
print(a)
Fancy Indexing
In NumPy, fancy indexing allows us to use an array of indices to access multiple array
elements at once. Fancy indexing can perform more advanced and efficient array
operations, including conditional filtering, sorting, and so on.
Example
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8])
We can also use fancy indexing on multi-dimensional arrays. Concept of fancy indexing
is also same in multi-dimensional arrays.
import numpy as np
a=np.array([[1,3,6],[2,7,1],[1,9,4]])
ind=[0,2]
print(a[ind])#prints row 0 and row 2
Example
import numpy as np
a = np.arange(10)
print("Dataset:",a)
s=np.sqrt(a)#unary universal function
print("Square Roots:",s)
e=np.exp(a)
print("Exp(a):",e)
x=np.random.randn(10)
y=np.random.randn(10)
m=np.max(x)
print("Maximum=",m)
Array Functions
import numpy as np
a = np.arange(10)
print("a=",a)
np.save('some_array', a)
b=np.load('some_array.npy')
print("b=",b)
c = np.arange(20)
print("c=",c)
np.savez('array_archive.npz', x=a, y=c)
arch = np.load('array_archive.npz')
print("Arrays in Archive:")
for k in arch:
print(arch[k])
Example
import numpy as np
a = np.loadtxt('/content/drive/My Drive/test.txt', delimiter=',')
print(a)
np.savetxt('/content/drive/My Drive/test1.txt', a)
print("File is saved")
import numpy as np
# invoking genfromtxt method to read employee.txt file
content = np.genfromtxt("/content/drive/My Drive/test.txt", dtype=str,
encoding = None, delimiter=",")
# print file data on console
print("File data:", content)
Linear Algebra
Linear algebra, like matrix multiplication, decompositions, determinants, and other
square matrix math, is an important part of any array library. Unlike some languages like
MATLAB, multiplying two two-dimensional arrays with * is an element-wise product
instead of a matrix dot product. Numpy.linalg has a standard set of matrix
decompositions and things like inverse and determinant. Commonly-used numpy.linalg
functions are listed below.
#Matrix multiplication
import numpy as np
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])
z=x.dot(y)
print(z)
r=np.dot(x, y)#equivalent to x.dot(y)
print(r)
import numpy as np
np.random.seed(100)
d=np.random.randint(0,10)
print("d=",d)
samples = np.random.normal(size=(4, 4))
print(samples)
d=np.random.permutation([1,2,3])
print("d=",d)
Series
Example
import pandas as pd
import numpy as np
obj = pd.Series([4, 7, -5, 3]) #series data structure
print(obj.values) #displaying values in the data structure
print(obj[1]) #vaue at index 1
obj[2]=5 #modifying value
print(obj.values)
print(obj[[1,2,3]]) #displaying values at index 1, 2, and 3
obj = pd.Series([4, 7, -5, 3], index=['a', 'b', 'c', 'd'])
print(obj.values)
print(obj['a'])
obj=obj*2 # scalar multiplication
print(obj.values)
print(obj[obj>0])#boolean indexing
print(np.exp(obj)) #using universal functions
DataFrame
import pandas as pd
data = {'State': ['Bagmati', 'Koshi', 'Karnali', 'Lumbini', 'Gandaki'],
'Year': [2000, 2001, 2002, 2001, 2002]}
frame1 = pd.DataFrame(data)#creating dataframe
print(frame1)
frame2 = pd.DataFrame(data,columns=["State","Year","Debt"])
print(frame2)#creating data frame
print(frame2["State"])#displaying column State
obj=pd.Series([2,5,3,3,4])
frame2["Debt"]=obj
print(frame2)#displaying data frame
print(frame2.values)#displaying in 2D array format
Index Objects
Pandas’s Index objects are responsible for holding the axis labels and other metadata (like
the axis names). Any array or other sequence of labels used when constructing a Series
or DataFrame is internally converted to an Index. Index objects are immutable and thus
can’t be modified by the user.
import pandas as pd
s= pd.Series(range(3), index=[1, 2, 3])
print(s)
print(s.index)
print(pd.Int64Index(s))
#s.index[1]='d'# index is immutable
Essential Functionalities
This section discusses fundamental mechanics of interacting with the data contained in a
Series or DataFrame.
Reindexing
A critical method on panda’s objects is reindex, which means to create a new object with
the data conformed to a new index. Calling reindex on this Series rearranges the data
according to the new index, introducing missing values if any index values were not
already present.
import pandas as pd
For ordered data like time series, it may be desirable to do some interpolation or filling
of values when reindexing. The method option allows us to do this, using a method such
as ffill which forward fills the values.
import pandas as pd
obj = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
print(obj)
obj1=obj.reindex(range(6), method='ffill')
print(obj1)
obj2=obj.reindex(range(6), method='bfill')
print(obj2)
Dropping one or more entries from an axis is easy if you have an index array or list
without those entries. As that can require a bit of munging and set logic, the drop method
will return a new object with the indicated value or values deleted from an axis.
import pandas as pd
import numpy as np
obj = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
print(obj)
obj1 = obj.drop('c')
print(obj1)
obj2 = obj.drop(['b','d'])
print(obj2)
import pandas as pd
Series indexing works analogously to NumPy array indexing, except we can use the
Steris’s index values instead of only integers.
import pandas as pd
import numpy as np
obj = pd.Series(np.arange(4), index=['a', 'b', 'c', 'd'])
print(obj[2]) #same as obj(['c'])
print(obj['c'])
print(obj[1:3])
print(obj[['b','c','d']])
Slicing with labels behaves differently than normal Python slicing in that the endpoint is
inclusive and setting using these methods works just as we would expect.
import pandas as pd
import numpy as np
obj = pd.Series(np.arange(4), index=['a', 'b', 'c', 'd'])
print(obj[2]) #same as obj(['c'])
print(obj['c'])
print(obj[1:3])
print(obj['b':'d'])
obj['b':'c'] = 5
print(obj)
Indexing into a DataFrame is for retrieving one or more columns either with a single
value or sequence:
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(16).reshape((4, 4)),index=['r1', 'r2', 'r3',
'r4'],
Another use case is in indexing with a Boolean DataFrame, such as one produced by a
scalar comparison.
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(16).reshape((4, 4)),index=['r1', 'r2', 'r3',
'r4'],
columns=['c1', 'c2', 'c3', 'c4'])
print(data < 5)
data[data < 5] = 0
print(data)
One of the most important pandas features is the behavior of arithmetic between objects
with different indexes. When adding together objects, if any index pairs are not the same,
the respective index in the result will be the union of the index pairs.
import pandas as pd
s1 = pd.Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s2 = pd.Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
s3=s1+s1
print(s3)
s3=s1+s2
print(s3)
In the case of DataFrame, alignment is performed on both the rows and the columns:
import pandas as pd
df1 = pd.DataFrame(np.arange(9).reshape((3, 3)), columns=list('bcd'),
index=['1', '2', '3'])
df2 = pd.DataFrame(np.arange(12).reshape((4, 3)), columns=list('bde'),
index=['1', '2', '3', '4'])
print(df1)
print(df2)
df=df1+df1
print(df)
import pandas as pd
df1 = pd.DataFrame(np.arange(12).reshape((3, 4)), columns=list('abcd'),
index=['1', '2', '3'])
df2 = pd.DataFrame(np.arange(20).reshape((4, 5)), columns=list('abcde'),
index=['1', '2', '3', '4'])
df=df1.add(df2, fill_value=0)
print(df)
df=df1.sub(df2, fill_value=0)
print(df)
df=df1.mul(df2, fill_value=0)
print(df)
df=df1.div(df2, fill_value=0)
print(df)
As with NumPy arrays, arithmetic between DataFrame and Series is well-defined. In such
case, operation is performed by using the concept of broadcasting.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(12).reshape((3, 4)))
s=pd.Series([2,4,5,7])
df1=df+s
print(df)
print(s)
print(df1)
By default, arithmetic between DataFrame and Series matches the index of the Series on
the Data Frame’s columns, broadcasting down the rows. If an index value is not found in
either the DataFrame’s columns or the Series’s index, the objects will be reindexed to form
the union.
import numpy as np
import pandas as pd
frame = pd.DataFrame(np.random.randn(4, 3),
columns=list('bde'),index=['r1', 'r2', 'r3', 'r4'])
print(frame)
frame=np.abs(frame)
print(frame)
import numpy as np
import pandas as pd
frame = pd.DataFrame(np.random.randn(4, 3),
columns=list('bde'),index=['r1', 'r2', 'r3', 'r4'])
print(frame)
frame=np.abs(frame)
print(frame)
f= lambda x: x.max() - x.min()
fr=frame.apply(f,axis=0)
print(fr)
The function passed to apply need not return a scalar value, it can also return a Series
with multiple values.
import numpy as np
import pandas as pd
def f(x):
return pd.Series([x.min(), x.max()], index=['min', 'max'])
import numpy as np
import pandas as pd
With a DataFrame, we can sort by index on either axis. The data is sorted in ascending
order by default, but can be sorted in descending order too.
import numpy as np
import pandas as pd
To sort a Series by its values, use its sort_values method. Any missing values are sorted
to the end of the Series by default.
import numpy as np
import pandas as pd
On DataFrame, We may want to sort by the values in one or more columns. To do so,
pass one or more column names to the by option:
import numpy as np
Ranking is closely related to sorting, assigning ranks from one through the number of
valid data points in an array. Ties are broken according to a rule. By default rank breaks
ties by assigning. We can also rank in descending order, too.
import numpy as np
import pandas as pd
import numpy as np
import pandas as pd
frame = pd.DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1], 'c': [-2,
5, 8, -2.5]})
print(frame)
fr=frame.rank(axis=1)
print(fr)
Series may have duplicate indices. The index’s is_unique property can tell you whether
its values are unique or not. Data selection is one of the main things that behaves
differently with duplicates. Indexing a value with multiple entries returns a Series while
single entries return a scalar value.
import numpy as np
import pandas as pd
import numpy as np
import pandas as pd
Example
import numpy as np
import pandas as pd
df = pd.DataFrame([[1.4, np.nan], [7.1, -4.5],[np.nan, np.nan], [0.75, -
1.3]],
index=['a', 'b', 'c', 'd'],columns=['one', 'two'])
print(df)
print(df.sum())
print(df.sum(axis=1))
print(df.mean())
print(df.describe())
Covariance
Covariance is a measure of the relationship between two random variables. It measures
the direction of the relationship between two variables. If the covariance for any two
variables is positive, that means, both the variables move in the same direction. If the
A square matrix provides the covariance between each pair of components (or elements)
of a given random vector is called a covariance matrix.
# importing pandas as pd
import pandas as pd
Correlation is a statistical measure that expresses the extent to which two variables are
linearly related (meaning they change together at a constant rate). It’s a common tool for
describing simple relationships without making a statement about cause and effect. The
sample correlation coefficient, r, quantifies the strength of the relationship. Correlation
coefficient quite close to 0, but either positive or negative, implies little or no relationship
between the two variables. A correlation coefficient close to plus 1 means a positive
relationship between the two variables, with increases in one of the variables being
associated with increment in the other variable. A correlation coefficient close to -1
indicates a negative relationship between two variables, with an increase in one of the
variables being associated with a decrease in the other variable. The most common
formula is the Pearson Correlation coefficient used for linear dependency between the
data sets and is given as below.
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√(𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 )(𝑛 ∑ 𝑦 2 − (∑ 𝑦)2 )
Example
# importing pandas as pd
import pandas as pd
# importing pandas as pd
import pandas as pd
s=pd.Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])
uniques = s.unique()
print(uniques)
l=s.value_counts()
print(l)
m = s.isin(['b', 'c'])
print(m)
print(s[m])
Example
Example 2
import pandas as pd
import numpy as np
data = pd.DataFrame([[np.nan, 6.5, 3.], [np.nan, np.nan, 2.0],[np.nan,
np.nan, np.nan], [np.nan, 6.5, 3.]])
print(data)
d1=data.dropna()
print(d1)
d2=data.fillna(0)
print(d2)
d3=data.dropna(how='all')
print(d3)
d4=data.dropna(how='all',axis=1)
print(d4)
Calling fillna with a dict you can use a different fill value for each column. fillna returns
a new object, but you can modify the existing object in place. The same interpolation
methods available for reindexing can be used with fillna. With fillna you can do lots of
other things with a little creativity. For example, we might pass the mean or median
value of a Series.
import pandas as pd
import numpy as np
data = pd.DataFrame([[np.nan, 6.5, 3.], [np.nan, np.nan, 2.0],[np.nan,
np.nan, np.nan], [np.nan, 6.5, 3.]])
d=data
d.fillna(0)
print(d)
d.fillna(0,inplace=True)
print(d)
Hierarchical Indexing?
Hierarchical indexing, also known as multi-level indexing, is a way of organizing data in
Pandas with multiple levels of row or column labels. This allows you to work with more
complex data structures than a simple table with one row and one column of labels. For
example, imagine we have a dataset with sales data for a company, broken down by
region and by quarter. You could organize this data with a hierarchical index that has
two levels: one for the region and one for the quarter.
Example
import pandas as pd
index = [('Kathmandu', 'Q1'), ('Kathmandu', 'Q2'),('Kathmandu', 'Q3'),
('Kathmandu', 'Q4'),
('Pokhara', 'Q1'), ('Pokhara', 'Q2'), ('Pokhara',
'Q3'),('Pokhara', 'Q4')]
sales = [350, 500,325, 475,200, 300,350,250]
sales_data = pd.Series(sales, index=index)
print(sales_data)
print(sales_data.index)
for x in index:
if(x[1]=='Q2'):
print(x,sales_data[x])
Panel Data
The Panel in Pandas is used for working with three-dimensional data. It has three main
axes these are items is the 0 axis which corresponds to the data, major-axis is the axis 1
for rows, and minor-axis is the axis 2 for columns. A panel can be created by using the
pandas panel () function. The panel in pandas is a three-dimensional container of data.
Matplotlib Library
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations
in Python. Matplotlib makes difficult things possible and simple things easy. matplotlib.pyplot is
a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes
some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines
in a plotting area, decorates the plot with labels, etc. Once we are done, we can save it with
savefig() or display it with show().
Example
from matplotlib import pyplot as plt
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]
# create a line chart, years on x-axis, gdp on y-axis
plt.plot(years, gdp, color='green', marker='o', linestyle='solid')
# add a title
plt.title("Nominal GDP")
# add a label to the x and y-axis
plt.ylabel("Billions")
plt.xlabel("Years")
plt.show()
Bar Charts
A bar chart, often known as a bar graph, is a diagram that displays categorical data as rectangular
bars with heights or lengths proportional to the values they stand for. You can plot the bars either
vertically or horizontally. A vertical bar chart may also be referred to as a column chart.
Comparisons among distinct categories are displayed in a bar graph. The comparison categories
are shown on one axis of the chart, and a measured value is shown on the other axis.
Example
from matplotlib import pyplot as plt
Country = ["Nepal", "Srilanka", "Bangladesh", "India",
"Bhutan","Madhives","Pakistan","Afganistan"]
GDP_growth_rate = [6.4, 4.5, 8.3, 7.4, 5.8,8.7,3.2,2.1]
# plot bars with Country as x-coordinate and GDP_growth_rate as height
plt.figure(figsize=(8,4))
plt.bar(Country, GDP_growth_rate)
plt.title("GDP Growth Rates of SAARC Countries") # add a title
Calling plt.barh() function with parameters y,x as plt.barh(y,x) plots horizontal bar chart.
Example
from matplotlib import pyplot as plt
Country = ["Nepal", "Srilanka", "Bangladesh", "India",
"Bhutan","Madhives","Pakistan","Afganistan"]
GDP_growth_rate = [6.4, 4.5, 8.3, 7.4, 5.8,8.7,3.2,2.1]
# plot bars with Country as x-coordinate and GDP_growth_rate as height
plt.figure(figsize=(8,4))
plt.barh(Country, GDP_growth_rate)
plt.title("GDP Growth Rates of SAARC Countries") # add a title
plt.ylabel("GDP Growth Rate") # label the y-axis
plt.xlabel("Country")#label the x-axis
# label x-axis with movie names at bar centers
plt.show()
Stacked bar charts have each plot stacked one over another. We used an unstacked bar chart to
compare each group; we can use a stacked plot to compare each individual. A stacked bar plot is
used to represent the grouping variable. Where group counts or relative proportions are being
plotted in a stacked manner. Occasionally, it is used to display the relative proportion summed to
100%.
Example
# importing package
import matplotlib.pyplot as plt
import numpy as np
# create data
x = ['A', 'B', 'C', 'D']
y1 = np.array([10, 20, 10, 30])
y2 = np.array([20, 25, 15, 25])
y3 = np.array([12, 15, 19, 6])
y4 = np.array([10, 29, 13, 19])
Line Charts
A line chart is a type of chart that provides a visual representation of data in the form of points that
are connected in a straight line. Line Charts are a good choice for showing trends. These charts are
used to represent the relation between two data X and Y on a different axis.
Example
import matplotlib.pyplot as plt
quantity=[1123,1256,1289,1378,1456,1367,1256]
amount=[2246,2512,2588,2702,2912,3214,3250]
Month=["Jan","Feb","Mar","Apr","May","June","July"]
plt.figure(figsize=(8,4))
plt.plot(Month,quantity,marker='x')
plt.plot(Month,amount,marker='o')
plt.title('Sales Trend')
plt.xlabel('Month')
plt.ylabel('Sales')
plt.legend(["Sales Quntity","Sales Amount"],loc="upper left")
Example
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Load the dataset into a Pandas DataFrame
df = pd.read_csv("/content/drive/My Drive/HistoricalPrices.csv")
plt.figure(figsize=(8,4))
Scatterplots
A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two
different numeric variables. The position of each dot on the horizontal and vertical axis
indicates values for an individual data point. Scatter plots are used to observe
relationships between variables.
Example
Example
Example 2
Example
Plotting Maps
Maps have been used for centuries to help people navigate and understand their
surroundings. In the age of big data, maps have become an essential tool for data
visualization. They allow us to visualize data in a way that is intuitive, interactive, and
Plotly is a powerful data visualization library for Python that allows you to create a wide
range of interactive visualizations, including maps. One of the advantages of Plotly is
that it is designed to work seamlessly with other Python libraries, such as Pandas and
NumPy. This makes it easy to import and manipulate data and to create visualizations
that are customized to your specific needs.
The Scattergeo() function is used to create a scatter plot on a geographic map. This means
that it can help you plot points on a map where each point represents a specific
geographic location, like a city or a landmark. For example, if you have a dataset that
contains the latitude and longitude coordinates of different cities around the world, we
can use Scattergeo() to plot each city on a world map.
Example
import plotly.express as px
import pandas as pd
Chaco
mayavi
Other Packages