0% found this document useful (0 votes)

12 views16 pages

Time Series Analysis Group 9

The document covers various data analysis techniques using Python, including time series analysis with Pandas, basic plotting with Matplotlib, and statistical measures such as frequency distribution and correlation. It also demonstrates building and validating linear and logistic regression models using synthetic data, showcasing evaluation metrics like Mean Squared Error and accuracy. The content is structured with code examples and outputs for practical understanding.

Uploaded by

mukeshkannansai22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views16 pages

Time Series Analysis Group 9

Uploaded by

mukeshkannansai22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

TIME SERIES ANALYSIS

import pandas as pd

import matplotlib.pyplot as plt

# Sample data (replace this with your own dataset)

data = {

'date': pd.date_range(start='2022-01-01', end='2022-12-31'),

'value': [i**2 for i in range(365)] # Sample data: squares of numbers from 0 to 364

# Create DataFrame

df = pd.DataFrame(data)

# Convert 'date' column to datetime type and set as index

df['date'] = pd.to_datetime(df['date'])

df.set_index('date', inplace=True)

# Basic time series analysis

print("Basic Time Series Analysis:")

print("----------------------------")

print("Data Summary:")

print(df.describe())

# Plot the time series

plt.figure(figsize=(10, 6))

plt.plot(df.index, df['value'], label='Value')

plt.title('Time Series Data')

plt.xlabel('Date')

plt.ylabel('Value')

plt.legend()

plt.grid(True)

plt.show()

OUTPUT:

Basic Time Series Analysis:

----------------------------
Data Summary:
value
count 365.000000
mean 44226.000000
std 39672.205705
min 0.000000
25% 8281.000000
50% 33124.000000
75% 74529.000000
max 132496.000000
Working with Pandas data frame

import pandas as pd

# Create a dictionary containing student data

data = {

'Name': ['John', 'Anna', 'Peter', 'Linda', 'Sophia'],

'Age': [25, 24, 26, 23, 27],

'Grade': ['A', 'B', 'A', 'B', 'A']

# Create a DataFrame from the dictionary

df = pd.DataFrame(data)

# Print the DataFrame

print("Original DataFrame:")

print(df)

# Accessing specific columns

print("\nAccessing specific columns:")

print(df['Name'])

print(df['Age'])

# Accessing specific rows

print("\nAccessing specific rows:")

print(df.iloc[0]) # Accessing the first row using iloc

print(df.loc[1]) # Accessing the second row using loc

# Filtering data

print("\nFiltering data:")

print(df[df['Age'] > 24]) # Filtering students older than 24

# Adding a new column

df['Gender'] = ['M', 'F', 'M', 'F', 'F']

print("\nDataFrame after adding a new column:")

print(df)

# Deleting a column

df.drop(columns=['Grade'], inplace=True)

print("\nDataFrame after deleting the 'Grade' column:")

print(df)

OUTPUT:

Original DataFrame:
Name Age Grade
0 John 25 A
1 Anna 24 B
2 Peter 26 A
3 Linda 23 B
4 Sophia 27 A

Accessing specific columns:

0 John
1 Anna
2 Peter
3 Linda
4 Sophia
Name: Name, dtype: object
0 25
1 24
2 26
3 23
4 27
Name: Age, dtype: int64
Accessing specific rows:
Name John
Age 25
Grade A
Name: 0, dtype: object
Name Anna
Age 24
Grade B
Name: 1, dtype: object

Filtering data:
Name Age Grade
0 John 25 A
2 Peter 26 A
4 Sophia 27 A

DataFrame after adding a new column:

Name Age Grade Gender
0 John 25 A M
1 Anna 24 B F
2 Peter 26 A M
3 Linda 23 B F
4 Sophia 27 A F

DataFrame after deleting the 'Grade' column:

Name Age Gender
0 John 25 M
1 Anna 24 F
2 Peter 26 M
3 Linda 23 F
4 Sophia 27 F
Basic Plots using Matplotlib

import matplotlib.pyplot as plt

# Sample data

x = [1, 2, 3, 4, 5]

y1 = [2, 3, 5, 7, 11]

y2 = [1, 4, 9, 16, 25]

# Plotting a line plot

plt.figure(figsize=(8, 4))

plt.plot(x, y1, marker='o', color='blue', label='y1') # Line plot for y1

plt.plot(x, y2, marker='s', color='red', label='y2') # Line plot for y2

plt.title('Line Plot')

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.legend()

plt.grid(True)

plt.show()

# Plotting a scatter plot

plt.figure(figsize=(8, 4))

plt.scatter(x, y1, color='green', label='y1') # Scatter plot for y1

plt.scatter(x, y2, color='orange', label='y2') # Scatter plot for y2

plt.title('Scatter Plot')

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.legend()

plt.grid(True)

plt.show()

# Plotting a bar plot

plt.figure(figsize=(8, 4))

plt.bar(x, y1, color='purple', label='y1') # Bar plot for y1

plt.bar(x, y2, color='pink', label='y2') # Bar plot for y2

plt.title('Bar Plot')

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.legend()

plt.grid(True)

plt.show()

OUTPUT:
Frequency distributors, Averages, Variability

import numpy as np

# Sample data
data = [12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38, 40, 42, 45, 48]

# Frequency distribution
def frequency_distribution(data):
freq_dict = {}
for item in data:
if item in freq_dict:
freq_dict[item] += 1
else:
freq_dict[item] = 1
return freq_dict

freq_dict = frequency_distribution(data)
print("Frequency Distribution:")
for key, value in freq_dict.items():
print(f"{key}: {value}")

# Calculating measures of central tendency: mean, median, mode

mean = np.mean(data)
median = np.median(data)
mode = max(freq_dict, key=freq_dict.get)
print("\nMeasures of Central Tendency:")
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")

# Calculating measures of variability: range, variance, standard deviation

range_data = np.ptp(data)
variance = np.var(data)
std_dev = np.std(data)
print("\nMeasures of Variability:")
print(f"Range: {range_data}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_dev}")

OUTPUT:

Frequency Distribution:
12: 1
15: 1
18: 1
20: 1
22: 1
25: 1
28: 1
30: 1
32: 1
35: 1
38: 1
40: 1
42: 1
45: 1
48: 1

Measures of Central Tendency:

Mean: 30.0
Median: 30.0
Mode: 12

Measures of Variability:
Range: 36
Variance: 118.13333333333334
Standard Deviation: 10.868915922636136
Normal Curves, Correlation and scatter plots, Correlation coefficient

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate random data for two variables

np.random.seed(0)
x = np.random.normal(loc=0, scale=1, size=100)
y = 2 * x + np.random.normal(loc=0, scale=1, size=100)

# Scatter plot
plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='blue')
plt.title('Scatter Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()

# Correlation coefficient
correlation_coefficient = np.corrcoef(x, y)[0, 1]
print("Correlation Coefficient:", correlation_coefficient)

# Plotting normal curves for x and y

plt.figure(figsize=(10, 6))

# Normal curve for x

x_values = np.linspace(-4, 4, 100)
x_pdf = norm.pdf(x_values, loc=np.mean(x), scale=np.std(x))
plt.plot(x_values, x_pdf, label='X', color='blue')

# Normal curve for y

y_values = np.linspace(-8, 8, 100)
y_pdf = norm.pdf(y_values, loc=np.mean(y), scale=np.std(y))
plt.plot(y_values, y_pdf, label='Y', color='red')

plt.title('Normal Curves')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()

OUTPUT
Regression python program
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Sample data
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]).reshape(-1, 1) # Reshape for single feature
y = np.array([2, 3, 5, 7, 11, 13, 17, 19, 23, 29])

# Create a linear regression model

model = LinearRegression()

# Train the model

model.fit(x, y)

# Make predictions
y_pred = model.predict(x)

# Plotting the original data and the regression line

plt.figure(figsize=(8, 6))
plt.scatter(x, y, color='blue', label='Actual Data')
plt.plot(x, y_pred, color='red', label='Regression Line')
plt.title('Linear Regression')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True)
plt.show()

# Coefficients
print("Intercept:", model.intercept_)
print("Slope:", model.coef_[0])

OUTPUT:
Building and validating linear models

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generate synthetic data

np.random.seed(0)
X = np.random.rand(100, 1) # Independent variable
y = 2.5 * X.squeeze() + np.random.normal(0, 0.5, 100) # Dependent variable with some
noise

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a linear regression model
model = LinearRegression()

# Train the model

model.fit(X_train, y_train)

# Make predictions on the testing set

y_pred = model.predict(X_test)

# Calculate evaluation metrics

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)

print("Coefficient of Determination (R^2):", r2)

OUTPUT:
Mean Squared Error (MSE): 0.22943831174285717
Coefficient of Determination (R^2): 0.559376074296551

Building and validating logistic models

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Generate synthetic data

np.random.seed(0)
X = np.random.randn(100, 2) # Independent variables
y = (X[:, 0] + X[:, 1] > 0).astype(int) # Binary target variable based on a simple condition

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a logistic regression model

model = LogisticRegression()
# Train the model
model.fit(X_train, y_train)

# Make predictions on the testing set

y_pred = model.predict(X_test)

# Calculate evaluation metrics

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:\n", conf_matrix)
print("Classification Report:\n", class_report)

OUTPUT:

Accuracy: 1.0
Confusion Matrix:
[[13 0]
[ 0 7]]
Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 13

1 1.00 1.00 1.00 7

accuracy 1.00 20
macro avg 1.00 1.00 1.00 20
weighted avg 1.00 1.00 1.00 20

Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Python Course Cheat Sheet
No ratings yet
Python Course Cheat Sheet
30 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
AD3411
No ratings yet
AD3411
28 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
PANDAS
No ratings yet
PANDAS
24 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
31 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
IP Practical File
No ratings yet
IP Practical File
18 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
Gec Practicals
No ratings yet
Gec Practicals
31 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
DP Prog
No ratings yet
DP Prog
10 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
Certificate
No ratings yet
Certificate
25 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
DSA Lab Manual Pgms - fINAL
No ratings yet
DSA Lab Manual Pgms - fINAL
34 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
38 pages
FDS Lab 1 Manuel .1..1new
No ratings yet
FDS Lab 1 Manuel .1..1new
34 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Ad3411 - Data Science and Analytics Laboratory
No ratings yet
Ad3411 - Data Science and Analytics Laboratory
26 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
17 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Batch1 Ds
No ratings yet
Batch1 Ds
15 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
External
No ratings yet
External
11 pages
Exp 2 SDK Ok
No ratings yet
Exp 2 SDK Ok
18 pages
Python Unit IV
No ratings yet
Python Unit IV
12 pages
Pandas
No ratings yet
Pandas
25 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Preksha Ai Practical Class 10th - 070428
No ratings yet
Preksha Ai Practical Class 10th - 070428
13 pages
DSC Lab Programs
No ratings yet
DSC Lab Programs
24 pages
AD3301 DEV Lab Manual
No ratings yet
AD3301 DEV Lab Manual
26 pages
Fods Lab Manual
No ratings yet
Fods Lab Manual
26 pages
EX-02-Data Manipulation Pandas Matplot
No ratings yet
EX-02-Data Manipulation Pandas Matplot
9 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
Ip Study
No ratings yet
Ip Study
18 pages
Even Students
No ratings yet
Even Students
36 pages
BDA File
No ratings yet
BDA File
26 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Easiest Lab Programs
No ratings yet
Easiest Lab Programs
5 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Q2 Module 5 - Data Analysis Using Statistics and Hypothesis Testing
No ratings yet
Q2 Module 5 - Data Analysis Using Statistics and Hypothesis Testing
9 pages
Correlation and Regression
No ratings yet
Correlation and Regression
33 pages
Inferential Statistics (AutoRecovered)
No ratings yet
Inferential Statistics (AutoRecovered)
12 pages
Asm1 570
No ratings yet
Asm1 570
16 pages
A - 24-Step - Guide - On - How - To - Desi SLR
No ratings yet
A - 24-Step - Guide - On - How - To - Desi SLR
13 pages
Measures of Relative Position Grouped 1
No ratings yet
Measures of Relative Position Grouped 1
20 pages
Lesson 2 Statistics Refresher
No ratings yet
Lesson 2 Statistics Refresher
32 pages
Stat 151 Formulas
100% (1)
Stat 151 Formulas
3 pages
Collateral of Bank Loan Ethiopia
No ratings yet
Collateral of Bank Loan Ethiopia
36 pages
S3.6 - Goodness of Fit and Contingency Tables (Examination Booklet and MS)
No ratings yet
S3.6 - Goodness of Fit and Contingency Tables (Examination Booklet and MS)
48 pages
Abisola
No ratings yet
Abisola
12 pages
Continuous Random Variable - IPS Slides
No ratings yet
Continuous Random Variable - IPS Slides
11 pages
1974 Jacob A Mincer - Schooling and Earnings
No ratings yet
1974 Jacob A Mincer - Schooling and Earnings
24 pages
HW10
No ratings yet
HW10
7 pages
Avinash Anand AQM
No ratings yet
Avinash Anand AQM
7 pages
Lesson Two
No ratings yet
Lesson Two
66 pages
Social Research Methods: Chapter 15: Quantitative Data Analysis
No ratings yet
Social Research Methods: Chapter 15: Quantitative Data Analysis
22 pages
2022 Year 10 5.3 AT3
No ratings yet
2022 Year 10 5.3 AT3
14 pages
Big Data & Business Analytics 2021 Q&a
No ratings yet
Big Data & Business Analytics 2021 Q&a
4 pages
Faktor Yang Mempengaruhi Tingkat Kepercayaan Nasabah Untuk Menabung Di KSPPS BMT Amanah Usaha Mulia (Aulia) Magelang
No ratings yet
Faktor Yang Mempengaruhi Tingkat Kepercayaan Nasabah Untuk Menabung Di KSPPS BMT Amanah Usaha Mulia (Aulia) Magelang
11 pages
Regression Analysis: March 2014
No ratings yet
Regression Analysis: March 2014
42 pages
Tugas Rista Bria
No ratings yet
Tugas Rista Bria
10 pages
Fitdistrplus R Package Fitting Distributions
No ratings yet
Fitdistrplus R Package Fitting Distributions
22 pages
James-Stein Estimator
No ratings yet
James-Stein Estimator
12 pages
Answer All Questions in This Section
No ratings yet
Answer All Questions in This Section
7 pages
Dhanush - Diabetes Report
No ratings yet
Dhanush - Diabetes Report
4 pages
Chapter 2: Organizing and Visualizing Variables: Self-Review
No ratings yet
Chapter 2: Organizing and Visualizing Variables: Self-Review
12 pages
Eviews Understanding
No ratings yet
Eviews Understanding
23 pages
SM025 GIAT 9 (Solutions)
No ratings yet
SM025 GIAT 9 (Solutions)
3 pages
SMMD Assignment 1
No ratings yet
SMMD Assignment 1
3 pages

Time Series Analysis Group 9

Uploaded by

Time Series Analysis Group 9

Uploaded by

TIME SERIES ANALYSIS

import matplotlib.pyplot as plt

# Sample data (replace this with your own dataset)

'date': pd.date_range(start='2022-01-01', end='2022-12-31'),

# Convert 'date' column to datetime type and set as index

# Basic time series analysis

print("Basic Time Series Analysis:")

# Plot the time series

plt.plot(df.index, df['value'], label='Value')

plt.title('Time Series Data')

Basic Time Series Analysis:

# Create a dictionary containing student data

'Name': ['John', 'Anna', 'Peter', 'Linda', 'Sophia'],

'Age': [25, 24, 26, 23, 27],

'Grade': ['A', 'B', 'A', 'B', 'A']

# Create a DataFrame from the dictionary

# Print the DataFrame

# Accessing specific columns

print("\nAccessing specific columns:")

# Accessing specific rows

print("\nAccessing specific rows:")

print(df.iloc[0]) # Accessing the first row using iloc

print(df[df['Age'] > 24]) # Filtering students older than 24

# Adding a new column

df['Gender'] = ['M', 'F', 'M', 'F', 'F']

print("\nDataFrame after adding a new column:")

print("\nDataFrame after deleting the 'Grade' column:")

Accessing specific columns:

DataFrame after adding a new column:

DataFrame after deleting the 'Grade' column:

import matplotlib.pyplot as plt

y2 = [1, 4, 9, 16, 25]

# Plotting a line plot

plt.plot(x, y1, marker='o', color='blue', label='y1') # Line plot for y1

plt.plot(x, y2, marker='s', color='red', label='y2') # Line plot for y2

# Plotting a scatter plot

plt.scatter(x, y1, color='green', label='y1') # Scatter plot for y1

plt.scatter(x, y2, color='orange', label='y2') # Scatter plot for y2

# Plotting a bar plot

plt.bar(x, y1, color='purple', label='y1') # Bar plot for y1

plt.bar(x, y2, color='pink', label='y2') # Bar plot for y2

# Calculating measures of central tendency: mean, median, mode

# Calculating measures of variability: range, variance, standard deviation

Measures of Central Tendency:

# Generate random data for two variables

# Plotting normal curves for x and y

# Normal curve for x

# Normal curve for y

# Create a linear regression model

# Train the model

# Plotting the original data and the regression line

# Generate synthetic data

# Split the data into training and testing sets

# Train the model

# Make predictions on the testing set

# Calculate evaluation metrics

print("Mean Squared Error (MSE):", mse)

Building and validating logistic models

# Generate synthetic data

# Split the data into training and testing sets

# Create a logistic regression model

# Make predictions on the testing set

# Calculate evaluation metrics

0 1.00 1.00 1.00 13

You might also like