0% found this document useful (0 votes)
12 views16 pages

Time Series Analysis Group 9

The document covers various data analysis techniques using Python, including time series analysis with Pandas, basic plotting with Matplotlib, and statistical measures such as frequency distribution and correlation. It also demonstrates building and validating linear and logistic regression models using synthetic data, showcasing evaluation metrics like Mean Squared Error and accuracy. The content is structured with code examples and outputs for practical understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

Time Series Analysis Group 9

The document covers various data analysis techniques using Python, including time series analysis with Pandas, basic plotting with Matplotlib, and statistical measures such as frequency distribution and correlation. It also demonstrates building and validating linear and logistic regression models using synthetic data, showcasing evaluation metrics like Mean Squared Error and accuracy. The content is structured with code examples and outputs for practical understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

TIME SERIES ANALYSIS

import pandas as pd

import matplotlib.pyplot as plt

# Sample data (replace this with your own dataset)

data = {

'date': pd.date_range(start='2022-01-01', end='2022-12-31'),

'value': [i**2 for i in range(365)] # Sample data: squares of numbers from 0 to 364

# Create DataFrame

df = pd.DataFrame(data)

# Convert 'date' column to datetime type and set as index

df['date'] = pd.to_datetime(df['date'])

df.set_index('date', inplace=True)

# Basic time series analysis

print("Basic Time Series Analysis:")

print("----------------------------")

print("Data Summary:")

print(df.describe())

# Plot the time series

plt.figure(figsize=(10, 6))

plt.plot(df.index, df['value'], label='Value')

plt.title('Time Series Data')


plt.xlabel('Date')

plt.ylabel('Value')

plt.legend()

plt.grid(True)

plt.show()

OUTPUT:

Basic Time Series Analysis:


----------------------------
Data Summary:
value
count 365.000000
mean 44226.000000
std 39672.205705
min 0.000000
25% 8281.000000
50% 33124.000000
75% 74529.000000
max 132496.000000
Working with Pandas data frame

import pandas as pd

# Create a dictionary containing student data

data = {

'Name': ['John', 'Anna', 'Peter', 'Linda', 'Sophia'],

'Age': [25, 24, 26, 23, 27],

'Grade': ['A', 'B', 'A', 'B', 'A']

# Create a DataFrame from the dictionary

df = pd.DataFrame(data)

# Print the DataFrame

print("Original DataFrame:")

print(df)

# Accessing specific columns

print("\nAccessing specific columns:")

print(df['Name'])

print(df['Age'])

# Accessing specific rows

print("\nAccessing specific rows:")

print(df.iloc[0]) # Accessing the first row using iloc


print(df.loc[1]) # Accessing the second row using loc

# Filtering data

print("\nFiltering data:")

print(df[df['Age'] > 24]) # Filtering students older than 24

# Adding a new column

df['Gender'] = ['M', 'F', 'M', 'F', 'F']

print("\nDataFrame after adding a new column:")

print(df)

# Deleting a column

df.drop(columns=['Grade'], inplace=True)

print("\nDataFrame after deleting the 'Grade' column:")

print(df)

OUTPUT:

Original DataFrame:
Name Age Grade
0 John 25 A
1 Anna 24 B
2 Peter 26 A
3 Linda 23 B
4 Sophia 27 A

Accessing specific columns:


0 John
1 Anna
2 Peter
3 Linda
4 Sophia
Name: Name, dtype: object
0 25
1 24
2 26
3 23
4 27
Name: Age, dtype: int64
Accessing specific rows:
Name John
Age 25
Grade A
Name: 0, dtype: object
Name Anna
Age 24
Grade B
Name: 1, dtype: object

Filtering data:
Name Age Grade
0 John 25 A
2 Peter 26 A
4 Sophia 27 A

DataFrame after adding a new column:


Name Age Grade Gender
0 John 25 A M
1 Anna 24 B F
2 Peter 26 A M
3 Linda 23 B F
4 Sophia 27 A F

DataFrame after deleting the 'Grade' column:


Name Age Gender
0 John 25 M
1 Anna 24 F
2 Peter 26 M
3 Linda 23 F
4 Sophia 27 F
Basic Plots using Matplotlib

import matplotlib.pyplot as plt

# Sample data

x = [1, 2, 3, 4, 5]

y1 = [2, 3, 5, 7, 11]

y2 = [1, 4, 9, 16, 25]

# Plotting a line plot

plt.figure(figsize=(8, 4))

plt.plot(x, y1, marker='o', color='blue', label='y1') # Line plot for y1

plt.plot(x, y2, marker='s', color='red', label='y2') # Line plot for y2

plt.title('Line Plot')

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.legend()

plt.grid(True)

plt.show()

# Plotting a scatter plot

plt.figure(figsize=(8, 4))

plt.scatter(x, y1, color='green', label='y1') # Scatter plot for y1

plt.scatter(x, y2, color='orange', label='y2') # Scatter plot for y2


plt.title('Scatter Plot')

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.legend()

plt.grid(True)

plt.show()

# Plotting a bar plot

plt.figure(figsize=(8, 4))

plt.bar(x, y1, color='purple', label='y1') # Bar plot for y1

plt.bar(x, y2, color='pink', label='y2') # Bar plot for y2

plt.title('Bar Plot')

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.legend()

plt.grid(True)

plt.show()

OUTPUT:
Frequency distributors, Averages, Variability

import numpy as np

# Sample data
data = [12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38, 40, 42, 45, 48]

# Frequency distribution
def frequency_distribution(data):
freq_dict = {}
for item in data:
if item in freq_dict:
freq_dict[item] += 1
else:
freq_dict[item] = 1
return freq_dict

freq_dict = frequency_distribution(data)
print("Frequency Distribution:")
for key, value in freq_dict.items():
print(f"{key}: {value}")

# Calculating measures of central tendency: mean, median, mode


mean = np.mean(data)
median = np.median(data)
mode = max(freq_dict, key=freq_dict.get)
print("\nMeasures of Central Tendency:")
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")

# Calculating measures of variability: range, variance, standard deviation


range_data = np.ptp(data)
variance = np.var(data)
std_dev = np.std(data)
print("\nMeasures of Variability:")
print(f"Range: {range_data}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_dev}")

OUTPUT:

Frequency Distribution:
12: 1
15: 1
18: 1
20: 1
22: 1
25: 1
28: 1
30: 1
32: 1
35: 1
38: 1
40: 1
42: 1
45: 1
48: 1

Measures of Central Tendency:


Mean: 30.0
Median: 30.0
Mode: 12

Measures of Variability:
Range: 36
Variance: 118.13333333333334
Standard Deviation: 10.868915922636136
Normal Curves, Correlation and scatter plots, Correlation coefficient

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate random data for two variables


np.random.seed(0)
x = np.random.normal(loc=0, scale=1, size=100)
y = 2 * x + np.random.normal(loc=0, scale=1, size=100)

# Scatter plot
plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='blue')
plt.title('Scatter Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()

# Correlation coefficient
correlation_coefficient = np.corrcoef(x, y)[0, 1]
print("Correlation Coefficient:", correlation_coefficient)

# Plotting normal curves for x and y


plt.figure(figsize=(10, 6))

# Normal curve for x


x_values = np.linspace(-4, 4, 100)
x_pdf = norm.pdf(x_values, loc=np.mean(x), scale=np.std(x))
plt.plot(x_values, x_pdf, label='X', color='blue')

# Normal curve for y


y_values = np.linspace(-8, 8, 100)
y_pdf = norm.pdf(y_values, loc=np.mean(y), scale=np.std(y))
plt.plot(y_values, y_pdf, label='Y', color='red')

plt.title('Normal Curves')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()

OUTPUT
Regression python program
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Sample data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1) # Reshape for single feature
y = np.array([2, 3, 5, 7, 11, 13, 17, 19, 23, 29])

# Create a linear regression model


model = LinearRegression()

# Train the model


model.fit(x, y)

# Make predictions
y_pred = model.predict(x)

# Plotting the original data and the regression line


plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='blue', label='Actual Data')
plt.plot(x, y_pred, color='red', label='Regression Line')
plt.title('Linear Regression')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True)
plt.show()

# Coefficients
print("Intercept:", model.intercept_)
print("Slope:", model.coef_[0])

OUTPUT:
Building and validating linear models

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data


np.random.seed(0)
X = np.random.rand(100, 1) # Independent variable
y = 2.5 * X.squeeze() + np.random.normal(0, 0.5, 100) # Dependent variable with some
noise

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a linear regression model
model = LinearRegression()

# Train the model


model.fit(X_train, y_train)

# Make predictions on the testing set


y_pred = model.predict(X_test)

# Calculate evaluation metrics


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)


print("Coefficient of Determination (R^2):", r2)

OUTPUT:
Mean Squared Error (MSE): 0.22943831174285717
Coefficient of Determination (R^2): 0.559376074296551

Building and validating logistic models


import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Generate synthetic data


np.random.seed(0)
X = np.random.randn(100, 2) # Independent variables
y = (X[:, 0] + X[:, 1] > 0).astype(int) # Binary target variable based on a simple condition

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model


model = LogisticRegression()
# Train the model
model.fit(X_train, y_train)

# Make predictions on the testing set


y_pred = model.predict(X_test)

# Calculate evaluation metrics


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)

OUTPUT:

Accuracy: 1.0
Confusion Matrix:
[[13 0]
[ 0 7]]
Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 13


1 1.00 1.00 1.00 7

accuracy 1.00 20
macro avg 1.00 1.00 1.00 20
weighted avg 1.00 1.00 1.00 20

You might also like