0% found this document useful (0 votes)
2 views4 pages

Unit 2 2

The document explains the concepts of correlation and covariance, highlighting their differences in measuring relationships between variables. It provides examples of using Python's pandas library to calculate correlation and covariance, as well as methods for finding unique values, counting occurrences, and interacting with text files and databases. Additionally, it covers how to fetch data from web APIs and save it into pandas DataFrames.

Uploaded by

servereurope5678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Unit 2 2

The document explains the concepts of correlation and covariance, highlighting their differences in measuring relationships between variables. It provides examples of using Python's pandas library to calculate correlation and covariance, as well as methods for finding unique values, counting occurrences, and interacting with text files and databases. Additionally, it covers how to fetch data from web APIs and save it into pandas DataFrames.

Uploaded by

servereurope5678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Correlation vs Covariance – Explained Simply

1. Correlation:

Shows how strongly two variables are related and whether the relationship is positive or negative.

• Value ranges between -1 and +1:

o +1: Perfect positive correlation (when one increases, the other increases)

o -1: Perfect negative correlation (when one increases, the other decreases)

o 0: No correlation (they don’t affect each other)

Example:

import pandas as pd

data = {

'StudyHours': [1, 2, 3, 4, 5],

'Scores': [50, 60, 65, 80, 85]

df = pd.DataFrame(data)

print(df.corr())

Output:

StudyHours Scores

StudyHours 1.000 0.983

Scores 0.983 1.000

High correlation (0.98): More hours studied leads to better scores

2. Covariance:

Measures how two variables vary together (but doesn’t scale the result like correlation).

• Value can be any number (positive/negative).

• Positive: both increase/decrease together.

• Negative: one increases, other decreases.

Example:

print(df.cov())

Output:

StudyHours Scores

StudyHours 2.5 21.25

Scores 21.25 218.75

Positive covariance → both values grow together.

Unique Values, Value Counts, Membership


3. unique() – Finds distinct values

import pandas as pd

obj = pd.Series(['a', 'b', 'a', 'c', 'b', 'd', 'c'])

print(obj.unique())

Output:

['a', 'b', 'c', 'd']

4. value_counts() – Frequency of each value

print(obj.value_counts())

Output:

a 2

b 2

c 2

d 1

5. isin() – Checks if values exist in a list

mask = obj.isin(['a', 'c'])

print(mask)

print(obj[mask])

Output:

0 True

1 False

2 True

3 True

4 False

5 False

6 True

0 a

2 a

3 c

6 c
6.1 Reading and Writing Data in Text Format (from pandas)

1. Reading Data from Text/CSV Files

You can load data from text files using pandas.read_csv().

Example:

import pandas as pd

df = pd.read_csv("sample_data.csv")

print(df.head())

• Reads a CSV file (comma-separated values).

• head() shows the first 5 rows.

• You can also read .txt files (if formatted properly).

2. Writing Data to Text/CSV Files

You can write DataFrames into files using to_csv().

Example:

df.to_csv("output.csv", index=False)

• Saves the DataFrame to a CSV file named output.csv.

• index=False avoids writing row numbers.

Small Example

Let's create and read a CSV file:

import pandas as pd

# Create sample data

data = {'Name': ['Shahul', 'Ravi'], 'Age': [21, 22]}

df = pd.DataFrame(data)

# Save to CSV

df.to_csv('students.csv', index=False)

# Read it back

df2 = pd.read_csv('students.csv')

print(df2)

Output:

Name Age

0 Shahul 21

1 Ravi 22
6.2 Interacting with Web APIs and Databases

1. Interacting with Web APIs

Web APIs return data (usually in JSON format) over the internet. We can fetch and convert this data into a pandas
DataFrame using the requests library.

Example – Getting data from a public API:

import requests

import pandas as pd

# Get data from an API

url = "https://jsonplaceholder.typicode.com/posts"

response = requests.get(url)

# Convert JSON data to pandas DataFrame

data = response.json()

df = pd.DataFrame(data)

print(df.head())

This fetches dummy blog post data and shows the first 5 entries.

2. Interacting with Databases

Python can connect to many databases like SQLite, MySQL, PostgreSQL using libraries like sqlite3 or SQLAlchemy.

Example – Using SQLite (built-in database)

import sqlite3

import pandas as pd

# Connect to database (creates one if it doesn't exist)

conn = sqlite3.connect('mydata.db')

# Create table and insert data

conn.execute("CREATE TABLE IF NOT EXISTS students (name TEXT, age INTEGER)")

conn.execute("INSERT INTO students VALUES ('Shahul', 21), ('Ravi', 22)")

conn.commit()

# Read data into pandas

df = pd.read_sql_query("SELECT * FROM students", conn)

print(df)

conn.close()

This saves and reads data using SQL.

You might also like