
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Contingency Table in Python
A contingency table is a table showing the distribution of one variable in rows and another variable in columns. It is used to study the correlation between the two variables. It is a multiway table which describes a dataset in which each observation belongs to one category for each of several variables. Also It is basically a tally of counts between two or more categorical variables. Contingency tables are also called crosstabs or two-way tables,used in statistics to summarize the relationship between several categorical variables.
The contingency coefficient is a coefficient of association which tells whether two variables or datasets are independent or dependent of each other,It is also known as Pearson's Coefficient
Example
In the below example we take the iris flower data set for analysis. This data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. We will create contingency model on these features which will be ultimately used in distinguishing the species from each other.
Reading the Dataset
Example
import numpy as np import pandas as pd datainput = pd.read_csv("iris.csv") print (datainput.head(5))
Running the above code gives us the following result:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa
General Statistics of the Data
Next, we gather the general statistics of the data by using the describe(). IT gives an idea about the mean and different quartiles of how the data is distributed.
Example
import numpy as np import pandas as pd datainput = pd.read_csv("iris.csv") print(datainput.describe())
Running the above code gives us the following result:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm count 150.000000 150.000000 150.000000 150.000000 mean 5.843333 3.054000 3.758667 1.198667 std 0.828066 0.433594 1.764420 0.763161 min 4.300000 2.000000 1.000000 0.100000 25% 5.100000 2.800000 1.600000 0.300000 50% 5.800000 3.000000 4.350000 1.300000 75% 6.400000 3.300000 5.100000 1.800000 max 7.900000 4.400000 6.900000 2.500000
Data Types
Next we observe different data types of the columns in the dataframe.
Example
import numpy as np import pandas as pd datainput = pd.read_csv("iris.csv") print(datainput.dtypes)
Running the above code gives us the following result:
SepalLengthCm float64 SepalWidthCm float64 PetalLengthCm float64 PetalWidthCm float64 Species object dtype: object
Creating Contingency Table
Now we create a contingency table for the column showing petal width for each species. For this we use the crosstab function available in pandas and give these tow column’s names as inputs.
Example
import numpy as np import pandas as pd datainput = pd.read_csv("iris.csv") width_species = pd.crosstab(datainput['PetalWidthCm'],datainput['Species'],margins = False) print(width_species)
Running the above code gives us the following result:
Species Iris-setosa Iris-versicolor Iris-virginica PetalWidthCm 0.1 6 0 0 0.2 28 0 0 0.3 7 0 0 1.0 0 7 0 1.1 0 3 0 1.2 0 5 0 1.8 0 1 11 1.9 0 0 5 2.0 0 0 6 2.1 0 0 6 2.5 0 0 3
Multi-variate Contingency Table
In this case we use more than two columns to create the contingency table. Here we use both petal length and petal width for each type of species.
import numpy as np import pandas as pd datainput = pd.read_csv("iris.csv") length_width_species = pd.crosstab([datainput.PetalLengthCm, datainput.PetalWidthCm],datainput.Species, margins = False) print(length_width_species)
Running the above code gives us the following result:
Species Iris-setosa Iris-versicolor Iris-virginica PetalLengthCm PetalWidthCm 1.0 0.2 1 0 0 1.1 0.1 1 0 0 1.2 0.2 2 0 0 1.3 0.2 4 0 0 0.3 2 0 0 ... ... ... ... 6.4 2.0 0 0 1 6.6 2.1 0 0 1 6.7 2.0 0 0 1 2.2 0 0 1 6.9 2.3 0 0 1