FDS Lab 1 Manuel .1..1new

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 34

Ex .No.

: 1
WORKING WITH PANDAS AND DATAFRAMES
Date:

AIM:

To implement working with pandas data frame using python program.

ALGORITHM:

Step1: Start the program

Step2: Import numpy and pandas module

Step3: Create a dataframe using the dictionary

Step4: Print the output

Step5: Stop the program

PROGRAM :

import pandas as pd data =

[["Nila", 25, "India"],

["Chithra", 30, "Singapore"], ["Gowthami", 35, "Paris" ]] df =

pd.DataFrame (data, columns = ["Name", "Age", "City"])

print(df)

THEORY:

Pandas is a powerful Python library used for data manipulation and analysis. It provides two
primary data structures:

Series: A one-dimensional labeled array, similar to a column in a spreadsheet.

DataFrame: A two-dimensional labeled data structure with columns of potentially different types,
similar to a table in a database or a spreadsheet.
OUTPUT:

RESULT:

Thus the program to implement numpy operations with array using python has been executed and
the output was verified successfully

Ex .No.: 2
WORKING WITH NUMPY ARRAYS
Date:
AIM:

Working with numpy arrays using python.

ALGORITHM:

Step1: Start the program

Step2: Import numpy modules

Step3: Print the basic characteristics and operations of array

Step4: Stop the program

THEORY:

NumPy (Numerical Python) is a powerful library for numerical computations in Python. It


provides support for arrays, which are efficient, multidimensional containers for large
datasets of homogeneous types. These arrays enable a wide range of mathematical
operations to be performed on large datasets with high performance.

PROGRAM :

(A) Create a NumPy array of zeros and ones

import numpy as np zeros\_array =

np.zeros((3, 3)) ones\_array = np.ones((2, 4))

print("Zeros Array:") print(zeros\_array)

print("Ones Array:") print(ones\_array)

OUTPUT :

ZEROS ARRAY :

[[0 0 0 ] [0 0 0] [0 0 0]]

ONES ARRAY:

[[1 1 1 1] [1 1 1 1]]
(B)Generate a Random NumPy Array

import numpy as np random\_array =

np.random.rand(3, 3) print("Random

Array:") print(random\_array)

OUTPUT:

Random Array:

[[0.22550229 0.75346623 0.18106636] [0.36264152

0.87597885 0.53328317]

[0.96555117 0.79383112 0.32111982]]

(C)Perform element-wise addition, subtraction, multiplication, and division

import numpy as np arr1 = np.array([[1, 2], [3, 4]]) arr2 = np.array([[5, 6], [7,

8]]) addition = np.add(arr1, arr2)

subtraction = np.subtract(arr1, arr2)

multiplication = np.multiply(arr1, arr2)

division = np.divide(arr1, arr2)

print("Addition:") print(addition)

print("Subtraction:") print(subtraction)

print("Multiplication:")

print(multiplication) print("Division:")

print(division)
OUTPUT:

Addition:

[[ 6 8] [10

12]]

Subtraction:

[[-4 -4]

[-4 -4]]

Multiplication:

[[ 5 12]

[21 32]]

Division:

[[0.2 0.33333333]

[0.42857143 0.5 ]]

(D) Transpose a NumPy array import

numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

transposed\_arr = np.transpose(arr)

print("Original Array:") print(arr)

print("Transposed Array:")

print(transposed\_arr)

OUTPUT:

Original Array:

[[1 2 3]

[4 5 6]]
Transposed Array:

[[1 4]

[2 5]

[3 6]]

(E) Find the index of the maximum and minimum element along an axis import

numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]]) max\_index\_axis0 = np.argmax(arr,

axis=0) min\_index\_axis1 = np.argmin(arr, axis=1) print("Index of

Maximum Element (along axis 0):", max\_index\_axis0) print("Index of

Minimum Element (along axis 1):", min\_index\_axis1)

OUTPUT:

Index of Maximum Element (along axis 0): [1 1 1]

Index of Minimum Element (along axis 1): [0 0]


RESULT:

Thus the implement python functions by using numpy arrays is successfully done
Ex .No.: 3
BASIC PLOTS USING MATPLOTLIB
Date:

AIM:

To creating basic plots using Matplotlib to visualize data distributions and trends .

ALGORTHM:

Step1: Start the program

Step2: Import Matplotlib Module

Step3: Create a basic plots using matplotlib

Step4: Print the output

Step5: Stop the program

THEORY:

Display data points connected by a line, often used to show trends over time.Show the relationship
between two variables, with each point representing an observation.Display the distribution of a
dataset by grouping data into bins and counting the number of observations in each bin.Show the
count or value of different categories.

PROGRAM:

import matplotlib.pyplot as plt

import numpy as np x =

np.linspace(0, 10, 100) y =

np.sin(x) plt.plot(x, y)

plt.title('Sine Wave')

plt.xlabel('X-axis')
plt.ylabel('Y-axis') plt.show()

x = np.random.rand(100) y =

np.random.rand(100)

plt.scatter(x, y)

plt.title('Scatter Plot')

plt.xlabel('X-axis')

plt.ylabel('Y-axis') plt.show()

categories = ['A', 'B', 'C', 'D']

values = [4, 7, 1, 8]

plt.bar(categories, values)

plt.title('Bar Plot')

plt.xlabel('Categories')

plt.ylabel('Values') plt.show()

data = np.random.randn(1000)

plt.hist(data, bins=30,

edgecolor='black')

plt.title('Histogram')

plt.xlabel('Value')

plt.ylabel('Frequency')

plt.show() labels = ['A', 'B',

'C', 'D'] sizes = [15, 30, 45,

10] plt.pie(sizes,

labels=labels, autopct='%1.1f

%%', startangle=140)

plt.title('Pie Chart') plt.show()

x = np.linspace(0, 10, 100) y1

= np.sin(x) y2 = np.cos(x)
# Create plot with different styles plt.plot(x, y1,

label='Sine', color='blue', linestyle='--') plt.plot(x, y2,

label='Cosine', color='red', linestyle='-.')

# Add titles, labels, and legend

plt.title('Sine and Cosine Waves')

plt.xlabel('X-axis') plt.ylabel('Y-

axis') plt.legend() # Show plot

plt.show() plt.plot(x, y)

plt.title('Sine Wave with Grid')

plt.xlabel('X-axis') plt.ylabel('Y-

axis') plt.grid(True) plt.show() #

Data x = np.linspace(0, 10, 100)

y1 = np.sin(x) y2 = np.cos(x) #

Create subplots fig, axs =

plt.subplots(2) # Plot on each

subplot axs[0].plot(x, y1, 'r')

axs[0].set_title('Sine')

axs[1].plot(x, y2, 'b')

axs[1].set_title('Cosine') #

Adjust layout plt.tight_layout()

plt.show()
OUTPUT :
RESULT:

Thus the basic plots matplotlib are distributed successfully .

Ex .No.: 4
FREQUENCY DISTRIBUTION, AVERAGES AND VARIABILITY
Date:

AIM:

To understand the distribution of values within the dataset and identify the frequency,
central tendency, dispersion of the dataset around the central tendency of each unique value.

ALGORITHM:

Step1: Start the program

Step2: Create an empty dictionary to store the frequency distribution.


Step3: Return the frequency distribution dictionary.

Step4: Add up all the numbers in the dataset. Step5:

Divide the sum by the total count of numbers

Step6: Sort the dataset from smallest to largest.

Step7: Count how many times each number appears in the dataset.

Step8: Analyze the results for frequency distributions, averages, and variability.

Step9: Stop the program

THEORY:

A frequency distribution is a summary of how often different values occur within a data set.
Averages are measures of central tendency that summarize a set of data with a single value
representing the center of the data. Variability refers to how spread out the data values are in a data
set.

PROGRAM :

import statistics def frequency\

_distribution(data):

freq\_dist = {} for

item in data: if item

in freq\_dist: freq\

_dist[item] += 1

else:

freq\_dist[item] = 1 return freq\_dist def

calculate\_mean(data): return

statistics.mean(data) def calculate\

_median(data): return statistics.median(data)

def calculate\_mode(data): return

statistics.mode(data) def calculate\

_range(data): return max(data) - min(data)


def calculate\_variance(data): return

statistics.variance(data) def calculate\_std\

_dev(data): return statistics.stdev(data) data

= [10, 20, 30, 40, 50, 20, 30, 40, 20, 10]

freq\_dist = frequency\_distribution(data)

print("Frequency Distribution:", freq\_dist)

mean = calculate\_mean(data)

print("Mean:", mean) median = calculate\

_median(data) print("Median:", median)

mode = calculate\_mode(data)

print("Mode:", mode)

data\_range = calculate\_range(data)

print("Range:", data\_range) variance

= calculate\_variance(data)

print("Variance:", variance) std\_dev

= calculate\_std\_dev(data)

print("Standard Deviation:", std\_dev)

OUTPUT:

Frequency Distribution: {10: 2, 20: 3, 30: 2, 40: 2, 50: 1}

Mean: 27

Median: 25.0

Mode: 20

Range: 40

Variance: 178.88888888888889
Standard Deviation: 13.37493509849258

RESULT:

The implementation of frequency distribution of averages and variability is successfully


done.

Ex .No.: 5
NORMAL CURVES, CORRELATION AND SCATTER PLOTS,
Date:
CORRELATION COEFFICIENT

AIM:

To visualize the distribution of data points along a bell-shaped curve. To visually represent the
relationship between two variables by plotting their data points.To quantify the strength and
direction of the linear relationship between two variables .

ALGORITHM:

Step1: Start the Program

Step2: Normal Curves:Generate an array of evenly spaced x-values ranging from min to max with
n points

Step3: Scatterplots Algorithm:Plot the x and y data points on a 2D graph.

Label the x and y axes appropriately.

Add a title to the graph.

Optionally, add gridlines for better visualization.

Display the scatterplot.

Step4: Correlation Coefficient Algorithm:Calculate the mean, Calculate the difference , Multiply
the differences for each pair of (x, y) values, Sum up the products, Calculate the standard
deviations, Divide the sum of products, Divide the result by the product, The result is the
correlation coefficient.
Step5: Stop the Program

THEORY:

A normal curve (or Gaussian distribution) is a bell-shaped curve that represents the distribution of a
set of data. It's symmetric around the mean and characterized by its mean (μ) and standard
deviation (σ). A scatter plot is a graph that shows the relationship between two variables using
Cartesian coordinates. The correlation coefficient (often denoted as rrr) quantifies the strength and
direction of the linear relationship between two variables.

PROGRAM :

import numpy as np import

matplotlib.pyplot as plt import

seaborn as sns

from scipy.stats import norm, pearsonr

# Generate some data np.random.seed(0) mean

= 0 std_dev = 1 data =

np.random.normal(mean, std_dev, 1000)

# Plotting the normal curve plt.figure(figsize=(10, 6)) sns.histplot(data,

kde=True, stat='density', bins=30, label='Data Histogram') xmin, xmax =

plt.xlim() x = np.linspace(xmin, xmax, 100) p = norm.pdf(x, mean,

std_dev) plt.plot(x, p, 'k', linewidth=2, label='Normal Distribution PDF')

plt.title('Normal Distribution Curve') plt.xlabel('Value')

plt.ylabel('Density') plt.legend() plt.show()

# Generate data for scatter plot and correlation

np.random.seed(0) x = np.random.rand(100) y

= 2 * x + 1 + np.random.normal(0, 0.1, 100)

# Scatter plot plt.figure(figsize=(10,

6)) plt.scatter(x, y, label='Data

points') plt.title('Scatter Plot')


plt.xlabel('X') plt.ylabel('Y')

plt.legend() plt.show()

# Calculate and display the correlation coefficient

corr_coeff, _ = pearsonr(x, y) print(f'Correlation

Coefficient: {corr_coeff:.2f}') # Display the

scatter plot with regression line sns.regplot(x=x,

y=y, ci=None) plt.title('Scatter Plot with

Regression Line') plt.xlabel('X') plt.ylabel('Y')

plt.show()

OUTPUT:

RESULT:

Thus the implementation of normal curves correlation and scatterplots, correlation


coefficient is successfully done.

Ex .No. : 6
Z – TEST
Date:

AIM:
To implement the z – test is to determine whether the mean of a sample is statistically significantly
different from the known or hypothesized population mean.

ALGORITHM:

Step 1: Define the two groups of data.

Step 2: Calculate means, standard deviations, and lengths for each group.

Step 3: Calculate the standard error and z-score.

Step 4: Calculate the p-value using the z-score.

Step 5: Print the results (means, standard deviations, z-score, p-value).

Step 6: Test the hypothesis based on the p-value and a significance level of 0.05.

Step 7: Plot a histogram for both groups, with labels, title, and legend.

THEORY:

A Z-test is a statistical test used to determine whether there is a significant difference between
sample and population means or between the means of two samples. It is applicable when the data
is approximately normally distributed and the sample size is large (typically n>30).

PROGRAM :

import numpy as np from scipy import stats import

matplotlib.pyplot as plt group1 =

[21,21.5,22,22.5,23,24.5,25,27,30,25] group2 = [31,

31.5, 33, 33.5, 35, 35.5, 37, 38, 37, 40] mean1,

mean2 = np.mean(group1), np.mean(group2) std1,

std2 = np.std(group1, ddof=1), np.std(group2,

ddof=1) n1, n2 = len(group1), len(group2) se =

np.sqrt(std1**2/n1 + std2**2/n2) z = (mean1 -

mean2) / se p_value = stats.norm.sf(abs(z)) * 2

print(f"Group 1 mean: {mean1}") print(f"Group 2

mean: {mean2}") print(f"Group 1 standard


deviation: {std1}") print(f"Group 2 standard

deviation: {std2}") print(f"Z-score: {z}") print(f"P-

value: {p_value}") alpha = 0.05 if p_value < alpha:

print("Reject the null hypothesis: There is a significant difference between the means.") else:

print("Fail to reject the null hypothesis: There is no significant difference between the means.")

plt.hist(group1, alpha=0.5, label='Group 1')

plt.hist(group2, alpha=1, label='Group 2')

plt.legend(loc='upper right')

plt.xlabel('Value') plt.ylabel('Frequency')

plt.title('Histogram of Group 1 and Group 2')

plt.show()

OUTPUT:
RESULT:

Thus the program has been implemented successfully.

Ex .No. : 7
T- TEST
Date:

AIM:

To implement the t - test

ALGORITHM:

Step 1: Define helper functions for variance checking and performing t-tests.
Step 2: Generate random data for one-sample, independent two-sample, and paired scenarios.

Step 3: Check variance and perform one-sample t-test.

Step 4: Check variances and perform independent two-sample t-test.

Step 5: Check variances and perform paired t-test.

Step 6: Visualize the distributions of the two independent groups using histograms with labels,
title, and legend.

THEORY:

A t-test is a statistical test used to determine if there is a significant difference between the means of
two groups. It helps to ascertain whether the observed differences in sample means are likely to
reflect real differences in the populations from which the samples were drawn, or if they might
have occurred by random chance.

Types of t-tests:

1. Independent Samples t-test: Compares the means of two independent groups (e.g., treatment
vs. control groups).
2. Paired Samples t-test: Compares the means of the same group at two different times (e.g.,
before and after treatment).
3. One-Sample t-test: Compares the sample mean to a known value (e.g., a population mean).

PROGRAM:

import numpy as np

import pandas as pd

from scipy import stats

import

matplotlib.pyplot as

plt import seaborn as

sns def

check_variance(group,

threshold=1e-5):

variance = np.var(group) if

variance < threshold:


print(f"Warning: Very low variance detected (Variance: {variance}). Results may be unreliable.")

return variance def perform_t_tests(group1, group2=None, test_type='independent'):

if test_type == 'one-sample': population_mean = group2 t_stat, p_value

= stats.ttest_1samp(group1, population_mean) print(f"One-Sample T-

Test:\nT-statistic: {t_stat}, P-value: {p_value}") elif test_type ==

'independent':

t_stat, p_value = stats.ttest_ind(group1, group2) print(f"Independent Two-Sample T-

Test:\nT-statistic: {t_stat}, P-value: {p_value}") elif test_type == 'paired':

t_stat, p_value = stats.ttest_rel(group1, group2)


print(f"Paired T-Test:\nT-statistic: {t_stat}, P-value: {p_value}")alpha = 0.05

if p_value < alpha:

print("Reject the null hypothesis: The means are significantly different.") else:

print("Fail to reject the null hypothesis: The means are not significantly different.")

print() np.random.seed(42) one_sample_data = np.random.normal(25, 0.01, 30)

population_mean = 25 group1 = np.random.normal(25, 5, 30)

group2 = np.random.normal(30, 5, 30) before =

np.random.normal(25, 5, 30) after = before + np.random.normal(0,

5, 30) check_variance(one_sample_data)

perform_t_tests(one_sample_data, population_mean, 'one-sample')

check_variance(group1) check_variance(group2)

perform_t_tests(group1, group2, 'independent')

check_variance(before) check_variance(after)

perform_t_tests(before, after, 'paired') sns.histplot(group1,

kde=True, label='Group 1', color='blue') sns.histplot(group2,

kde=True, label='Group 2', color='red') plt.legend()

plt.title('Distribution of Group 1 and Group 2')

plt.xlabel('Value') plt.ylabel('Frequency') plt.show()


OUTPUT:

RESULT:
Thus the implementation of t-test is successfully done.

Ex .No. : 8
ANOVA
Date:

AIM:

To implement the ANOVA (Analysis of Variance) is to determine whether there is a


significant difference between the means of three or more groups.

ALGORITHM:
Step1: Start the program
Step2: Formulate Hypotheses
Step3: Set Significance Level
Step4: Calculate Group Means
Step5: Calculate Overall Mean
Step6: Calculate Sum of Squares
Step7: Calculate F-Statistic
Step8: Determine Critical Value or P-value
Step9: Make Decision
Step10: Stop the program

THEORY:

ANOVA, or Analysis of Variance, is a statistical technique used to determine if there are


significant differences between the means of three or more groups. It helps to test
hypotheses about whether the group means are all equal or if at least one group mean is
different from the others.

Types of ANOVA:

1. One-Way ANOVA: Tests for differences among group means in a single factor (e.g.,
comparing test scores of students across different teaching methods).
2. Two-Way ANOVA: Examines the effect of two different factors simultaneously and can
also assess the interaction between the two factors (e.g., studying the effect of teaching
method and study time on test scores).

PROGRAM:

import numpy as np import scipy.stats

as stats import statsmodels.api as sm


from statsmodels.formula.api import ols

import pandas as pd # Example data

# Three groups with different means

np.random.seed(0) group1 =

np.random.normal(20, 5, 30) group2 =

np.random.normal(22, 5, 30) group3 =

np.random.normal(23, 5, 30) # Combine

the data into a single dataframe data =

pd.DataFrame({

'value': np.concatenate([group1, group2, group3]),

'group': np.repeat(['group1', 'group2', 'group3'], repeats=30)

})

# Perform ANOVA using statsmodels model =

ols('value ~ C(group)', data=data).fit() anova_table =

sm.stats.anova_lm(model, typ=2) print(anova_table)

f_val, p_val = stats.f_oneway(group1, group2, group3)

print(f"F-statistic: {f_val}, P-value: {p_val}")

OUTPUT:

RESULT:
Thus the implementation of the ANOVA (Analysis of Variance) is successfully done.

Ex .No. : 9
BUILDING AND VALIDATING LINEAR MODELS
Date:

AIM:

To implement a building and validating linear models.

ALGORITHM:

Step 1: Import Libraries:Import necessary libraries for numpy, matplotlib, and scikit-learn.

Step 2: Generate Data:Set random seed.Generate feature matrix X and target vector y.

Step 3: Split Data:Split data into training and test sets.

Step 4: Train Model:Initialize and train a linear regression model using training data.

Step 5: Predict:Predict target values for test data.

Step 6: Evaluate Model:Calculate and print Mean Squared Error (MSE).

Step 7: Visualize Results:Plot actual vs predicted values.

PROGRAM:

import numpy as np import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

np.random.seed(0)

X = 2 * np.random.rand(100, 1) y = 4 +

3 * X + np.random.randn(100, 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression() model.fit(X_train, y_train)


y_pred = model.predict(X_test) mse =

mean_squared_error(y_test, y_pred) print("Mean

Squared Error:", mse) plt.scatter(X_test, y_test,

color='blue', label='Actual') plt.plot(X_test, y_pred,

color='red', label='Predicted') plt.xlabel('X')

plt.ylabel('y') plt.title('Actual vs Predicted')

plt.legend() plt.show()

OUTPUT:

RESULT:

Thus the implementation of a building and validating linear models is successfully done.
Ex .No. : 10
BUILDING AND VALIDATING LOGISTIC MODELS
Date:

AIM:

To implement a building and validating logistic models.


ALGORITHM:

Step 1: Import Libraries:Import numpy, matplotlib.pyplot, and scikit-learn modules.

Step 2: Generate Data:Set random seed.Generate feature matrix X and target vector y using random
values.

Step 3: Split Data:Split data into training and test sets using train_test_split.

Step 4: Train Model:Initialize and train a linear regression model using LinearRegression.

Step 5: Predict:Use the trained model to predict target values for the test data.

Step 6: Evaluate Model:Calculate and print the Mean Squared Error (MSE) using
mean_squared_error.

Step 7: Visualize Results: Plot actual vs predicted values using matplotlib.

THEORY:

Building and validating logistic regression models involves preparing the data, specifying the
logistic function, estimating model parameters, and fitting the model. Model validation ensures that
the model performs well on new, unseen data by evaluating it using various metrics,
crossvalidation, and diagnostic tests. Regularization techniques are employed to mitigate overfitting
and improve generalizability.

PROGRAM:

import numpy as np import matplotlib.pyplot as plt


from scipy.spatial.distance import cdist def
locally_weighted_regression(x_query, X, y, tau):
"""

Perform Locally Weighted Regression for a query point x_query.


Parameters: x_query
: float
The query point at which to predict the value.
X : numpy.ndarray
The input features (training data).
y : numpy.ndarray The target
values (training data).
tau : float
The bandwidth parameter for the weighting function.
Returns:float
The predicted value at the query point.
"""
# Add intercept term to X and x_query m
= X.shape[0]
X_ = np.hstack((np.ones((m, 1)), X.reshape(-1, 1))) x_query_
= np.array([1, x_query]).reshape(1, -1)
# Compute weights
distances = cdist(X_, x_query_, metric='euclidean').flatten() weights
= np.exp(-distances**2 / (2 * tau**2))
# Perform weighted linear regression W
= np.diag(weights)
theta = np.linalg.inv(X_.T @ W @ X_) @ X_.T @ W @
y return x_query_ @ theta # Generate synthetic dataset
np.random.seed(42) X = np.linspace(0, 10, 100)
y = np.sin(X) + np.random.normal(scale=0.1, size=X.shape)
# Fit Locally Weighted Regression for a range of query
points tau = 0.5 # Bandwidth parameter x_query_points =
np.linspace(0, 10, 100)
y_pred = np.array([locally_weighted_regression(x_query, X, y, tau) for x_query in x_query_points])
# Plot the results plt.figure(figsize=(10,
6))
plt.scatter(X, y, label='Training Data', color='blue')
plt.plot(x_query_points, y_pred, label='LWR Predictions',
color='red') plt.title('Locally Weighted Regression') plt.xlabel('X')
plt.ylabel('y') plt.legend() plt.show()

OUTPUT:
RESULT:
Thus the implementation of a building and validating logistic models is successfully done.

Ex .No. : 11
Date:
TIME SERIES ANALYSIS

AIM:

To implement a time series analysis by using pandas and matplotlib libraries

ALGORITHM:

Step 1: Import Libraries. Import pandas and matplotlib.pyplot.

Step 2: Create Data: Create a dictionary with date range and corresponding values.

Step 3: Create Data Frame:Convert dictionary to Data Frame.Convert date column to datetime.Set
date column as index.

Step 4: Plot Data: Plot the time series data. Customize plot with labels, title, legend, and grid.
Step 5: Display Plot:Show the plot.

Step 6: Generate Summary Statistics: Print summary statistics of the Data Frame. Print the first few
rows of the Data Frame.

THEORY:

Time series analysis involves methods for analyzing time-ordered data points to extract
meaningful statistics, identify patterns, and forecast future values. It is widely used in
various fields, such as economics, finance, environmental science, and engineering, where
understanding temporal dynamics is crucial.

Components of a Time Series:

• Trend: The long-term progression of the series (upward, downward, or stationary).


• Seasonality: Regular, repeating patterns or cycles in the data (e.g., monthly sales due to
holidays).
• Cyclicality: Irregular, longer-term oscillations (e.g., economic cycles).
• Noise (Residual): Random variation or irregular fluctuations in the data.

Stationarity:

• Definition: A stationary time series has statistical properties, such as mean and variance,
that are constant over time.
• Importance: Many time series models assume stationarity. Non-stationary series often
need to be transformed to stationary (e.g., differencing).

Autocorrelation:

• Definition: The correlation of a time series with its own past values.
• ACF and PACF: Autocorrelation Function (ACF) and Partial Autocorrelation Function
(PACF) help identify the presence of autocorrelation and determine appropriate model
parameters.

PROGRAM:

import pandas as pd import

matplotlib.pyplot as plt data =

'date': pd.date_range(start='2024-01-01', periods=5),

'value': [10, 12, 15, 18, 20]


} df = pd.DataFrame(data) df['date'] =

pd.to_datetime(df['date']) df.set_index('date',

inplace=True) plt.figure(figsize=(10, 6))

plt.plot(df.index, df['value'], label='Time Series Data')

plt.xlabel('Date') plt.ylabel('Value') plt.title('Time

Series Data') plt.legend() plt.grid(True) plt.show()

print("Summary Statistics:")

print(df.describe()) print("\

nOutput:") print(df.head())

OUTPUT:
RESULT:
Thus the implementation of a time series analysis is successfully done.

Ex .No. : 12
REGRESSION
Date:

AIM:

To model and predict the value of the dependent variable based on the values of the independent
variables.

ALGORITHM:

Step1: Start the Program

Step2: Calculate the means of the independent variable(s)

Step3: Calculate the deviations from the means

Step4: Optionally, you can calculate other statistics such as the coefficient of determination Step5:

Stop the Program

THEORY:

Regression is a statistical method used to examine the relationship between two or more variables.
The primary goal is to model the relationship and make predictions.

PROGRAM:

import numpy as np import pandas as pd from

sklearn.model_selection import train_test_split from

sklearn.linear_model import LinearRegression from

sklearn.metrics import mean_squared_error, r2_score

import matplotlib.pyplot as plt np.random.seed(0)


X = 2 * np.random.rand(100, 1) y = 4 + 3 * X +

np.random.randn(100, 1) data = pd.DataFrame(np.hstack([X,

y]), columns=["X", "y"])

_train, X_test, y_train, y_test = train_test_split(data[["X"]], data["y"], test_size=0.2,


random_state=42)

model = LinearRegression() model.fit(X_train, y_train) y_pred =

model.predict(X_test) mse = mean_squared_error(y_test, y_pred) r2

= r2_score(y_test, y_pred) print(f"Mean Squared Error: {mse}")

print(f"R^2 Score: {r2}") plt.scatter(X_test, y_test, color="blue",

label="Actual") plt.plot(X_test, y_pred, color="red", linewidth=2,

label="Predicted") plt.xlabel("X") plt.ylabel("y") plt.title("Linear

Regression") plt.legend() plt.show()

OUTPUT:
RESULT:

Thus the program implemented successfully.

You might also like