0% found this document useful (0 votes)
7 views16 pages

Assignment 4 5

Uploaded by

caditi102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views16 pages

Assignment 4 5

Uploaded by

caditi102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

DEPARTMENT OF MECHANICAL ENGINEERING

Data Analytics Lab


Assignment No. 4

Name: Aditi Satyajit Chaudhari

MIS: 612210035 Div:1


Batch: B
CODE: ]
import pandas as pd import numpy as np data1 = pd.read_csv('/content/annual-
enterprise-survey-2023-financial-yearprovisional.csv') print('The Data Is :\n',
data1) print('The Top Two Lines of the Data Are :\n', data1.head(2)) print('The
Bottom Two Lines of the Data Are :\n', data1.tail(2)) print('The Shape of the Data
Are :\n', data1.shape) print('The Type of Data Are :\n', type(data1)) print('The
Total Number of Observations in the Data Are :\n', data1.shape[0]) print('The
Total Number of Missing Values Are :\n', data1.isnull().sum().sum()) print('The
Total Number of Nan Values Are :\n', data1.isna().sum().sum()) OUTPUT:

The Data Is :
Year Industry_aggregation_NZSIOC Industry_code_NZSIOC \
0 2023 Level 1 99999
1 2023 Level 1 99999
2 2023 Level 1 99999
3 2023 Level 1 99999
4 2023 Level 1 99999
... ... ... ...
50980 2013 Level 3 ZZ11
50981 2013 Level 3 ZZ11
50982 2013 Level 3 ZZ11
50983 2013 Level 3 ZZ11
50984 2013 Level 3 ZZ11
Industry_name_NZSIOC Units Variable_code \
0 All industries Dollars (millions) H01
1 All industries Dollars (millions) H04
2 All industries Dollars (millions) H05
3 All industries Dollars (millions) H07
4 All industries Dollars (millions) H08
... ... ... ...
50980 Food product manufacturing Percentage H37
50981 Food product manufacturing Percentage H38
50982 Food product manufacturing Percentage H39 50983 Food
product manufacturing Percentage H40
50984 Food product manufacturing Percentage H41
Variable_name Variable_category \
0 Total income Financial performance
1 Sales, government funding, grants and subsidies Financial
performance
2 Interest, dividends and donations Financial performance
3 Non-operating income Financial performance
4 Total expenditure Financial performance
... ... ...
50980 Quick ratio Financial ratios
50981 Margin on sales of goods for resale Financial ratios
50982 Return on equity Financial ratios
50983 Return on total assets Financial ratios
50984 Liabilities structure Financial ratios

Value Industry_code_ANZSIC06
0 930995 ANZSIC06 divisions A-S (excluding classes K633...
1 821630 ANZSIC06 divisions A-S (excluding classes K633...
2 84354 ANZSIC06 divisions A-S (excluding classes K633...
3 25010 ANZSIC06 divisions A-S (excluding classes K633...
4 832964 ANZSIC06 divisions A-S (excluding classes K633...
... ... ...
50980 52 ANZSIC06 groups C111, C112, C113, C114, C115, ...
50981 40 ANZSIC06 groups C111, C112, C113, C114, C115, ...
50982 12 ANZSIC06 groups C111, C112, C113, C114, C115, ...
50983 5 ANZSIC06 groups C111, C112, C113, C114, C115, ...
50984 46 ANZSIC06 groups C111, C112, C113, C114, C115, ...

[50985 rows x 10 columns]


The Top Two Lines of the Data Are :
Year Industry_aggregation_NZSIOC Industry_code_NZSIOC
Industry_name_NZSIOC \
0 2023 Level 1 99999 All
industries 1 2023 Level 1 99999
All industries

Units Variable_code \
0 Dollars (millions) H01
1 Dollars (millions) H04

Variable_name Variable_category \
0 Total income Financial performance
1 Sales, government funding, grants and subsidies Financial
performance

Value Industry_code_ANZSIC06
0 930995 ANZSIC06 divisions A-S (excluding classes K633...
1 821630 ANZSIC06 divisions A-S (excluding classes K633...
The Bottom Two Lines of the Data Are :
Year Industry_aggregation_NZSIOC Industry_code_NZSIOC \
50983 2013 Level 3 ZZ11
50984 2013 Level 3 ZZ11

Industry_name_NZSIOC Units Variable_code \


50983 Food product manufacturing Percentage H40
50984 Food product manufacturing Percentage H41
Variable_name Variable_category Value \
50983 Return on total assets Financial ratios 5
50984 Liabilities structure Financial ratios 46

Industry_code_ANZSIC06
50983 ANZSIC06 groups C111, C112, C113, C114, C115, ...
50984 ANZSIC06 groups C111, C112, C113, C114, C115, ...
The Shape of the Data Are :
(50985, 10)
The Type of Data Are :
<class 'pandas.core.frame.DataFrame'>
The Total Number of Observations in the Data Are :
50985
The Total Number of Missing Values Are :
0
The Total Number of Nan Values Are :
0
DEPARTMENT OF MECHANICAL ENGINEERING
Data Analytics Lab
Assignment No. 5

Name: Aditi Satyajit Chaudhari

MIS: 612210035 Div:1


Batch: B

CODE:
from sklearn.datasets import load_iris iris=load_iris()
print('Keys of The Data Set :\n',iris.keys()) print('\nNumber of
Rows and Columns :\n', iris.data.shape) print('\nColumn
Names :\n', iris.feature_names) print('\nDataset Description
:\n', iris.DESCR)

OUTPUT:
Keys of The Data Set :
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names',
'filename', 'data_module'])

Number of Rows and Columns :


(150, 4)
Column Names :
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Dataset Description :
.. _iris_dataset:

Iris plants dataset


--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)


:Number of Attributes: 4 numeric, predictive attributes and the class :Attribute
Information:
- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm - class:
- Iris-Setosa
- Iris-Versicolour
- Iris-Virginica

:Summary Statistics:

============== ==== ==== ======= ===== ====================


Min Max Mean SD Class Correlation
============== ==== ==== ======= ===== ====================
sepal length: 4.3 7.9 5.84 0.83 0.7826
sepal width: 2.0 4.4 3.05 0.43 -0.4194 petal
length: 1.0 6.9 3.76 1.76 0.9490 (high!) petal
width: 0.1 2.5 1.20 0.76 0.9565 (high!)
============== ==== ==== ======= ===== ====================

:Missing Attribute Values: None


:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%[email protected])
:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken from
Fisher's paper. Note that it's the same as in R, but not as in the UCI Machine
Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the pattern


recognition literature. Fisher's paper is a classic in the field and is
referenced frequently to this day. (See Duda & Hart, for example.) The data
set contains 3 classes of 50 instances each, where each class refers to a type
of iris plant. One class is linearly separable from the other 2; the latter are
NOT linearly separable from each other.
|details-start|
**References**
|details-split|

- Fisher, R.A. "The use of multiple measurements in taxonomic problems"


Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
Mathematical Statistics" (John Wiley, NY, 1950).
- Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
- Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
Structure and Classification Rule for Recognition in Partially Exposed
Environments". IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. PAMI-2, No. 1, 67-71.
- Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE Transactions
on Information Theory, May 1972, 431-433.
- See also: 1988 MLC Proceedings, 54-64. Cheeseman et al"s AUTOCLASS II
conceptual clustering system finds 3 classes in the data.
- Many, many more ...

|details-end|
CODE:import numpy as np from scipy.sparse import csr_matrix
dense_matrix=np.eye(3)
sparse_matrix=csr_matrix(dense_matrix) print(sparse_matrix)
iris_df=pd.read_csv('/Iris.csv') print(iris_df)
iris_df_modified=iris_df.drop(columns=['Id']) print("\n
Modified Data Frame (Without 'Id' Column) :")
print(iris_df_modified) iris_df_modified=iris_df.drop(index=2)
print(iris_df_modified) Output : (0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
Id SepalLengthCm SepalWidthCm PetalLengthCm
PetalWidthCm \
0 1 5.1 3.5 1.4 0.2
1 2 4.9 3.0 1.4 0.2
2 3 4.7 3.2 1.3 0.2
3 4 4.6 3.1 1.5 0.2
4 5 5.0 3.6 1.4
0.2
.. ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3
146 147 6.3 2.5 5.0 1.9
147 148 6.5 3.0 5.2 2.0
148 149 6.2 3.4 5.4 2.3
149 150 5.9 3.0 5.1 1.8

Species
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
.. ...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica

[150 rows x 6 columns]

Modified Data Frame (Without 'Id' Column) :


SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
Species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

[150 rows x 5 columns]


Id SepalLengthCm SepalWidthCm PetalLengthCm
PetalWidthCm \
0 1 5.1 3.5 1.4 0.2
1 2 4.9 3.0 1.4 0.2
3 4 4.6 3.1 1.5 0.2
4 5 5.0 3.6 1.4 0.2
5 6 5.4 3.9 1.7 0.4 .. ... ...
... ... ...
145 146 6.7 3.0 5.2 2.3
146 147 6.3 2.5 5.0 1.9
147 148 6.5 3.0 5.2 2.0
148 149 6.2 3.4 5.4 2.3
149 150 5.9 3.0 5.1 1.8

Species
0 Iris-setosa
1 Iris-setosa
3 Iris-setosa
4 Iris-setosa
5 Iris-setosa
.. ...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica
[149 rows x 6 columns]

CODE: iris_df=pd.DataFrame(data=iris.data, columns=iris.feature_names)


stats=iris_df.describe() print(stats)
Output : sepal length (cm) sepal width (cm) petal length (cm) \
count 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000
std 0.828066 0.435866 1.765298
min 4.300000 2.000000 1.000000
25% 5.100000 2.800000 1.600000
50% 5.800000 3.000000 4.350000
75% 6.400000 3.300000 5.100000
max 7.900000 4.400000 6.900000

petal width (cm) count


150.000000 mean
1.199333 std
0.762238 min
0.100000 25%
0.300000
50% 1.300000
75% 1.800000
max 2.500000

CODE:iris_df_modified=iris_df.drop(columns=['Id']) print("\n
Modified Data Frame (Without 'Id' Column) :")
print(iris_df_modified)
iris_df_modified=iris_df.drop(index=2)
print(iris_df_modified)
Output : Modified Data Frame (Without 'Id' Column) :
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
Species
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
.. ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

[150 rows x 5 columns]


Id SepalLengthCm SepalWidthCm PetalLengthCm
PetalWidthCm \
0 1 5.1 3.5 1.4 0.2
1 2 4.9 3.0 1.4 0.2
3 4 4.6 3.1 1.5 0.2
4 5 5.0 3.6 1.4 0.2
5 6 5.4 3.9 1.7 0.4 .. ... ...
... ... ...
145 146 6.7 3.0 5.2 2.3
146 147 6.3 2.5 5.0 1.9
147 148 6.5 3.0 5.2 2.0
148 149 6.2 3.4 5.4 2.3
149 150 5.9 3.0 5.1 1.8

Species
0 Iris-setosa
1 Iris-setosa
3 Iris-setosa
4 Iris-setosa
5 Iris-setosa
.. ...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica

[149 rows x 6 columns]

You might also like