FDS Lab 1 Manuel .1..1new

Ex .No.
: 1
WORKING WITH PANDAS AND DATAFRAMES
Date:
AIM:
To implement working with pandas data frame using python program.
ALGORITHM:
Step1: Start the program
Step2: Import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop the program
PROGRAM :
import pandas as pd data =
[["Nila", 25, "India"],
["Chithra", 30, "Singapore"], ["Gowthami", 35, "Paris" ]] df =
pd.DataFrame (data, columns = ["Name", "Age", "City"])
print(df)
THEORY:
Pandas is a powerful Python library used for data manipulation and analysis. It provides two
primary data structures:
Series: A one-dimensional labeled array, similar to a column in a spreadsheet.
DataFrame: A two-dimensional labeled data structure with columns of potentially different types,
similar to a table in a database or a spreadsheet.
OUTPUT:
RESULT:
Thus the program to implement numpy operations with array using python has been executed and
the output was verified successfully
Ex .No.: 2
WORKING WITH NUMPY ARRAYS
Date:
AIM:
Working with numpy arrays using python.
ALGORITHM:
Step2: Import numpy modules
Step3: Print the basic characteristics and operations of array
THEORY:
NumPy (Numerical Python) is a powerful library for numerical computations in Python. It

provides support for arrays, which are efficient, multidimensional containers for large
datasets of homogeneous types. These arrays enable a wide range of mathematical
operations to be performed on large datasets with high performance.
PROGRAM :
(A) Create a NumPy array of zeros and ones
import numpy as np zeros\_array =
np.zeros((3, 3)) ones\_array = np.ones((2, 4))
print("Zeros Array:") print(zeros\_array)
print("Ones Array:") print(ones\_array)
OUTPUT :
ZEROS ARRAY :
[[0 0 0 ] [0 0 0] [0 0 0]]
ONES ARRAY:
[[1 1 1 1] [1 1 1 1]]
(B)Generate a Random NumPy Array
import numpy as np random\_array =
np.random.rand(3, 3) print("Random
Array:") print(random\_array)
OUTPUT:
Random Array:
[[0.22550229 0.75346623 0.18106636] [0.36264152
0.87597885 0.53328317]
[0.96555117 0.79383112 0.32111982]]
(C)Perform element-wise addition, subtraction, multiplication, and division
import numpy as np arr1 = np.array([[1, 2], [3, 4]]) arr2 = np.array([[5, 6], [7,
8]]) addition = np.add(arr1, arr2)
subtraction = np.subtract(arr1, arr2)
multiplication = np.multiply(arr1, arr2)
division = np.divide(arr1, arr2)
print("Addition:") print(addition)
print("Subtraction:") print(subtraction)
print("Multiplication:")
print(multiplication) print("Division:")
print(division)
OUTPUT:
Addition:
[[ 6 8] [10
12]]
Subtraction:
[[-4 -4]
[-4 -4]]
Multiplication:
[[ 5 12]
[21 32]]
Division:
[[0.2 0.33333333]
[0.42857143 0.5 ]]
(D) Transpose a NumPy array import
numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
transposed\_arr = np.transpose(arr)
print("Original Array:") print(arr)
print("Transposed Array:")
print(transposed\_arr)
OUTPUT:
Original Array:
[[1 2 3]
[4 5 6]]
Transposed Array:
[[1 4]
[2 5]
[3 6]]
(E) Find the index of the maximum and minimum element along an axis import
numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]]) max\_index\_axis0 = np.argmax(arr,
axis=0) min\_index\_axis1 = np.argmin(arr, axis=1) print("Index of
Maximum Element (along axis 0):", max\_index\_axis0) print("Index of
Minimum Element (along axis 1):", min\_index\_axis1)
OUTPUT:
Index of Maximum Element (along axis 0): [1 1 1]
Index of Minimum Element (along axis 1): [0 0]

RESULT:
Thus the implement python functions by using numpy arrays is successfully done
Ex .No.: 3
BASIC PLOTS USING MATPLOTLIB
Date:
AIM:
To creating basic plots using Matplotlib to visualize data distributions and trends .
ALGORTHM:
Step2: Import Matplotlib Module
Step3: Create a basic plots using matplotlib
Step4: Print the output
THEORY:
Display data points connected by a line, often used to show trends over time.Show the relationship
between two variables, with each point representing an observation.Display the distribution of a
dataset by grouping data into bins and counting the number of observations in each bin.Show the
count or value of different categories.
PROGRAM:
import matplotlib.pyplot as plt
import numpy as np x =
np.linspace(0, 10, 100) y =
np.sin(x) plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('X-axis')
plt.ylabel('Y-axis') plt.show()
x = np.random.rand(100) y =
np.random.rand(100)
plt.scatter(x, y)
plt.title('Scatter Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis') plt.show()
categories = ['A', 'B', 'C', 'D']
values = [4, 7, 1, 8]
plt.bar(categories, values)
plt.title('Bar Plot')
plt.xlabel('Categories')
plt.ylabel('Values') plt.show()
data = np.random.randn(1000)
plt.hist(data, bins=30,
edgecolor='black')
plt.title('Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show() labels = ['A', 'B',
'C', 'D'] sizes = [15, 30, 45,
10] plt.pie(sizes,
labels=labels, autopct='%1.1f
%%', startangle=140)
plt.title('Pie Chart') plt.show()
x = np.linspace(0, 10, 100) y1
= np.sin(x) y2 = np.cos(x)
# Create plot with different styles plt.plot(x, y1,
label='Sine', color='blue', linestyle='--') plt.plot(x, y2,
label='Cosine', color='red', linestyle='-.')
# Add titles, labels, and legend
plt.title('Sine and Cosine Waves')
plt.xlabel('X-axis') plt.ylabel('Y-
axis') plt.legend() # Show plot
plt.show() plt.plot(x, y)
plt.title('Sine Wave with Grid')
plt.xlabel('X-axis') plt.ylabel('Y-
axis') plt.grid(True) plt.show() #
Data x = np.linspace(0, 10, 100)
y1 = np.sin(x) y2 = np.cos(x) #
Create subplots fig, axs =
plt.subplots(2) # Plot on each
subplot axs[0].plot(x, y1, 'r')
axs[0].set_title('Sine')
axs[1].plot(x, y2, 'b')
axs[1].set_title('Cosine') #
Adjust layout plt.tight_layout()
plt.show()
OUTPUT :
RESULT:
Thus the basic plots matplotlib are distributed successfully .
Ex .No.: 4
FREQUENCY DISTRIBUTION, AVERAGES AND VARIABILITY
Date:
AIM:
To understand the distribution of values within the dataset and identify the frequency,
central tendency, dispersion of the dataset around the central tendency of each unique value.
ALGORITHM:
Step2: Create an empty dictionary to store the frequency distribution.

Step3: Return the frequency distribution dictionary.
Step4: Add up all the numbers in the dataset. Step5:
Divide the sum by the total count of numbers
Step6: Sort the dataset from smallest to largest.
Step7: Count how many times each number appears in the dataset.
Step8: Analyze the results for frequency distributions, averages, and variability.
THEORY:
A frequency distribution is a summary of how often different values occur within a data set.
Averages are measures of central tendency that summarize a set of data with a single value
representing the center of the data. Variability refers to how spread out the data values are in a data
set.
PROGRAM :
import statistics def frequency\
_distribution(data):
freq\_dist = {} for
item in data: if item
in freq\_dist: freq\
_dist[item] += 1
else:
freq\_dist[item] = 1 return freq\_dist def
calculate\_mean(data): return
statistics.mean(data) def calculate\
_median(data): return statistics.median(data)
def calculate\_mode(data): return
statistics.mode(data) def calculate\
_range(data): return max(data) - min(data)

def calculate\_variance(data): return
statistics.variance(data) def calculate\_std\
_dev(data): return statistics.stdev(data) data
= [10, 20, 30, 40, 50, 20, 30, 40, 20, 10]
freq\_dist = frequency\_distribution(data)
print("Frequency Distribution:", freq\_dist)
mean = calculate\_mean(data)
print("Mean:", mean) median = calculate\
_median(data) print("Median:", median)
mode = calculate\_mode(data)
print("Mode:", mode)
data\_range = calculate\_range(data)
print("Range:", data\_range) variance
= calculate\_variance(data)
print("Variance:", variance) std\_dev
= calculate\_std\_dev(data)
print("Standard Deviation:", std\_dev)
OUTPUT:
Frequency Distribution: {10: 2, 20: 3, 30: 2, 40: 2, 50: 1}
Mean: 27
Median: 25.0
Mode: 20
Range: 40
Variance: 178.88888888888889
Standard Deviation: 13.37493509849258
RESULT:
The implementation of frequency distribution of averages and variability is successfully

done.
Ex .No.: 5
NORMAL CURVES, CORRELATION AND SCATTER PLOTS,
Date:
CORRELATION COEFFICIENT
AIM:
To visualize the distribution of data points along a bell-shaped curve. To visually represent the
relationship between two variables by plotting their data points.To quantify the strength and
direction of the linear relationship between two variables .
ALGORITHM:
Step1: Start the Program
Step2: Normal Curves:Generate an array of evenly spaced x-values ranging from min to max with
n points
Step3: Scatterplots Algorithm:Plot the x and y data points on a 2D graph.
Label the x and y axes appropriately.
Add a title to the graph.
Optionally, add gridlines for better visualization.
Display the scatterplot.
Step4: Correlation Coefficient Algorithm:Calculate the mean, Calculate the difference , Multiply
the differences for each pair of (x, y) values, Sum up the products, Calculate the standard
deviations, Divide the sum of products, Divide the result by the product, The result is the
correlation coefficient.
Step5: Stop the Program
THEORY:
A normal curve (or Gaussian distribution) is a bell-shaped curve that represents the distribution of a
set of data. It's symmetric around the mean and characterized by its mean (μ) and standard
deviation (σ). A scatter plot is a graph that shows the relationship between two variables using
Cartesian coordinates. The correlation coefficient (often denoted as rrr) quantifies the strength and
direction of the linear relationship between two variables.
PROGRAM :
import numpy as np import
matplotlib.pyplot as plt import
seaborn as sns
from scipy.stats import norm, pearsonr
# Generate some data np.random.seed(0) mean
= 0 std_dev = 1 data =
np.random.normal(mean, std_dev, 1000)
# Plotting the normal curve plt.figure(figsize=(10, 6)) sns.histplot(data,
kde=True, stat='density', bins=30, label='Data Histogram') xmin, xmax =
plt.xlim() x = np.linspace(xmin, xmax, 100) p = norm.pdf(x, mean,
std_dev) plt.plot(x, p, 'k', linewidth=2, label='Normal Distribution PDF')
plt.title('Normal Distribution Curve') plt.xlabel('Value')
plt.ylabel('Density') plt.legend() plt.show()
# Generate data for scatter plot and correlation
np.random.seed(0) x = np.random.rand(100) y
= 2 * x + 1 + np.random.normal(0, 0.1, 100)
# Scatter plot plt.figure(figsize=(10,
6)) plt.scatter(x, y, label='Data
points') plt.title('Scatter Plot')

plt.xlabel('X') plt.ylabel('Y')
plt.legend() plt.show()
# Calculate and display the correlation coefficient
corr_coeff, _ = pearsonr(x, y) print(f'Correlation
Coefficient: {corr_coeff:.2f}') # Display the
scatter plot with regression line sns.regplot(x=x,
y=y, ci=None) plt.title('Scatter Plot with
Regression Line') plt.xlabel('X') plt.ylabel('Y')
plt.show()
OUTPUT:
RESULT:
Thus the implementation of normal curves correlation and scatterplots, correlation

coefficient is successfully done.
Ex .No. : 6
Z – TEST
Date:
AIM:
To implement the z – test is to determine whether the mean of a sample is statistically significantly
different from the known or hypothesized population mean.
ALGORITHM:
Step 1: Define the two groups of data.
Step 2: Calculate means, standard deviations, and lengths for each group.
Step 3: Calculate the standard error and z-score.
Step 4: Calculate the p-value using the z-score.
Step 5: Print the results (means, standard deviations, z-score, p-value).
Step 6: Test the hypothesis based on the p-value and a significance level of 0.05.
Step 7: Plot a histogram for both groups, with labels, title, and legend.
THEORY:
A Z-test is a statistical test used to determine whether there is a significant difference between
sample and population means or between the means of two samples. It is applicable when the data
is approximately normally distributed and the sample size is large (typically n>30).
PROGRAM :
import numpy as np from scipy import stats import
matplotlib.pyplot as plt group1 =
[21,21.5,22,22.5,23,24.5,25,27,30,25] group2 = [31,
31.5, 33, 33.5, 35, 35.5, 37, 38, 37, 40] mean1,
mean2 = np.mean(group1), np.mean(group2) std1,
std2 = np.std(group1, ddof=1), np.std(group2,
ddof=1) n1, n2 = len(group1), len(group2) se =
np.sqrt(std1**2/n1 + std2**2/n2) z = (mean1 -
mean2) / se p_value = stats.norm.sf(abs(z)) * 2
print(f"Group 1 mean: {mean1}") print(f"Group 2
mean: {mean2}") print(f"Group 1 standard

deviation: {std1}") print(f"Group 2 standard
deviation: {std2}") print(f"Z-score: {z}") print(f"P-
value: {p_value}") alpha = 0.05 if p_value < alpha:
print("Reject the null hypothesis: There is a significant difference between the means.") else:
print("Fail to reject the null hypothesis: There is no significant difference between the means.")
plt.hist(group1, alpha=0.5, label='Group 1')
plt.hist(group2, alpha=1, label='Group 2')
plt.legend(loc='upper right')
plt.xlabel('Value') plt.ylabel('Frequency')
plt.title('Histogram of Group 1 and Group 2')
plt.show()
OUTPUT:
RESULT:
Thus the program has been implemented successfully.
Ex .No. : 7
T- TEST
Date:
AIM:
To implement the t - test
ALGORITHM:
Step 1: Define helper functions for variance checking and performing t-tests.
Step 2: Generate random data for one-sample, independent two-sample, and paired scenarios.
Step 3: Check variance and perform one-sample t-test.
Step 4: Check variances and perform independent two-sample t-test.
Step 5: Check variances and perform paired t-test.
Step 6: Visualize the distributions of the two independent groups using histograms with labels,
title, and legend.
THEORY:
A t-test is a statistical test used to determine if there is a significant difference between the means of
two groups. It helps to ascertain whether the observed differences in sample means are likely to
reflect real differences in the populations from which the samples were drawn, or if they might
have occurred by random chance.
Types of t-tests:
1. Independent Samples t-test: Compares the means of two independent groups (e.g., treatment
vs. control groups).
2. Paired Samples t-test: Compares the means of the same group at two different times (e.g.,
before and after treatment).
3. One-Sample t-test: Compares the sample mean to a known value (e.g., a population mean).
PROGRAM:
import numpy as np
import pandas as pd
from scipy import stats
import
matplotlib.pyplot as
plt import seaborn as
sns def
check_variance(group,
threshold=1e-5):
variance = np.var(group) if
variance < threshold:

print(f"Warning: Very low variance detected (Variance: {variance}). Results may be unreliable.")
return variance def perform_t_tests(group1, group2=None, test_type='independent'):
if test_type == 'one-sample': population_mean = group2 t_stat, p_value
= stats.ttest_1samp(group1, population_mean) print(f"One-Sample T-
Test:\nT-statistic: {t_stat}, P-value: {p_value}") elif test_type ==
'independent':
t_stat, p_value = stats.ttest_ind(group1, group2) print(f"Independent Two-Sample T-
Test:\nT-statistic: {t_stat}, P-value: {p_value}") elif test_type == 'paired':
t_stat, p_value = stats.ttest_rel(group1, group2)

print(f"Paired T-Test:\nT-statistic: {t_stat}, P-value: {p_value}")alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis: The means are significantly different.") else:
print("Fail to reject the null hypothesis: The means are not significantly different.")
print() np.random.seed(42) one_sample_data = np.random.normal(25, 0.01, 30)
population_mean = 25 group1 = np.random.normal(25, 5, 30)
group2 = np.random.normal(30, 5, 30) before =
np.random.normal(25, 5, 30) after = before + np.random.normal(0,
5, 30) check_variance(one_sample_data)
perform_t_tests(one_sample_data, population_mean, 'one-sample')
check_variance(group1) check_variance(group2)
perform_t_tests(group1, group2, 'independent')
check_variance(before) check_variance(after)
perform_t_tests(before, after, 'paired') sns.histplot(group1,
kde=True, label='Group 1', color='blue') sns.histplot(group2,
kde=True, label='Group 2', color='red') plt.legend()
plt.title('Distribution of Group 1 and Group 2')
plt.xlabel('Value') plt.ylabel('Frequency') plt.show()

OUTPUT:
RESULT:
Thus the implementation of t-test is successfully done.
Ex .No. : 8
ANOVA
Date:
AIM:
To implement the ANOVA (Analysis of Variance) is to determine whether there is a

significant difference between the means of three or more groups.
ALGORITHM:
Step2: Formulate Hypotheses
Step3: Set Significance Level
Step4: Calculate Group Means
Step5: Calculate Overall Mean
Step6: Calculate Sum of Squares
Step7: Calculate F-Statistic
Step8: Determine Critical Value or P-value
Step9: Make Decision
THEORY:
ANOVA, or Analysis of Variance, is a statistical technique used to determine if there are

significant differences between the means of three or more groups. It helps to test
hypotheses about whether the group means are all equal or if at least one group mean is
different from the others.
Types of ANOVA:
1. One-Way ANOVA: Tests for differences among group means in a single factor (e.g.,
comparing test scores of students across different teaching methods).
2. Two-Way ANOVA: Examines the effect of two different factors simultaneously and can
also assess the interaction between the two factors (e.g., studying the effect of teaching
method and study time on test scores).
PROGRAM:
import numpy as np import scipy.stats
as stats import statsmodels.api as sm

from statsmodels.formula.api import ols
import pandas as pd # Example data
# Three groups with different means
np.random.seed(0) group1 =
np.random.normal(20, 5, 30) group2 =
np.random.normal(22, 5, 30) group3 =
np.random.normal(23, 5, 30) # Combine
the data into a single dataframe data =
pd.DataFrame({
'value': np.concatenate([group1, group2, group3]),
'group': np.repeat(['group1', 'group2', 'group3'], repeats=30)
})
# Perform ANOVA using statsmodels model =
ols('value ~ C(group)', data=data).fit() anova_table =
sm.stats.anova_lm(model, typ=2) print(anova_table)
f_val, p_val = stats.f_oneway(group1, group2, group3)
print(f"F-statistic: {f_val}, P-value: {p_val}")
OUTPUT:
RESULT:
Thus the implementation of the ANOVA (Analysis of Variance) is successfully done.
Ex .No. : 9
BUILDING AND VALIDATING LINEAR MODELS
Date:
AIM:
To implement a building and validating linear models.
ALGORITHM:
Step 1: Import Libraries:Import necessary libraries for numpy, matplotlib, and scikit-learn.
Step 2: Generate Data:Set random seed.Generate feature matrix X and target vector y.
Step 3: Split Data:Split data into training and test sets.
Step 4: Train Model:Initialize and train a linear regression model using training data.
Step 5: Predict:Predict target values for test data.
Step 6: Evaluate Model:Calculate and print Mean Squared Error (MSE).
Step 7: Visualize Results:Plot actual vs predicted values.
PROGRAM:
import numpy as np import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
np.random.seed(0)
X = 2 * np.random.rand(100, 1) y = 4 +
3 * X + np.random.randn(100, 1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression() model.fit(X_train, y_train)

y_pred = model.predict(X_test) mse =
mean_squared_error(y_test, y_pred) print("Mean
Squared Error:", mse) plt.scatter(X_test, y_test,
color='blue', label='Actual') plt.plot(X_test, y_pred,
color='red', label='Predicted') plt.xlabel('X')
plt.ylabel('y') plt.title('Actual vs Predicted')
plt.legend() plt.show()
OUTPUT:
RESULT:
Thus the implementation of a building and validating linear models is successfully done.
Ex .No. : 10
BUILDING AND VALIDATING LOGISTIC MODELS
Date:
AIM:
To implement a building and validating logistic models.

ALGORITHM:
Step 1: Import Libraries:Import numpy, matplotlib.pyplot, and scikit-learn modules.
Step 2: Generate Data:Set random seed.Generate feature matrix X and target vector y using random
values.
Step 3: Split Data:Split data into training and test sets using train_test_split.
Step 4: Train Model:Initialize and train a linear regression model using LinearRegression.
Step 5: Predict:Use the trained model to predict target values for the test data.
Step 6: Evaluate Model:Calculate and print the Mean Squared Error (MSE) using
mean_squared_error.
Step 7: Visualize Results: Plot actual vs predicted values using matplotlib.
THEORY:
Building and validating logistic regression models involves preparing the data, specifying the
logistic function, estimating model parameters, and fitting the model. Model validation ensures that
the model performs well on new, unseen data by evaluating it using various metrics,
crossvalidation, and diagnostic tests. Regularization techniques are employed to mitigate overfitting
and improve generalizability.
PROGRAM:
import numpy as np import matplotlib.pyplot as plt

from scipy.spatial.distance import cdist def
locally_weighted_regression(x_query, X, y, tau):
"""
Perform Locally Weighted Regression for a query point x_query.

Parameters: x_query
: float
The query point at which to predict the value.
X : numpy.ndarray
The input features (training data).
y : numpy.ndarray The target
values (training data).
tau : float
The bandwidth parameter for the weighting function.
Returns:float
The predicted value at the query point.
"""
# Add intercept term to X and x_query m
= X.shape[0]
X_ = np.hstack((np.ones((m, 1)), X.reshape(-1, 1))) x_query_
= np.array([1, x_query]).reshape(1, -1)
# Compute weights
distances = cdist(X_, x_query_, metric='euclidean').flatten() weights
= np.exp(-distances**2 / (2 * tau**2))
# Perform weighted linear regression W
= np.diag(weights)
theta = np.linalg.inv(X_.T @ W @ X_) @ X_.T @ W @
y return x_query_ @ theta # Generate synthetic dataset
np.random.seed(42) X = np.linspace(0, 10, 100)
y = np.sin(X) + np.random.normal(scale=0.1, size=X.shape)
# Fit Locally Weighted Regression for a range of query
points tau = 0.5 # Bandwidth parameter x_query_points =
np.linspace(0, 10, 100)
y_pred = np.array([locally_weighted_regression(x_query, X, y, tau) for x_query in x_query_points])
# Plot the results plt.figure(figsize=(10,
6))
plt.scatter(X, y, label='Training Data', color='blue')
plt.plot(x_query_points, y_pred, label='LWR Predictions',
color='red') plt.title('Locally Weighted Regression') plt.xlabel('X')
plt.ylabel('y') plt.legend() plt.show()
OUTPUT:
RESULT:
Thus the implementation of a building and validating logistic models is successfully done.
Ex .No. : 11
Date:
TIME SERIES ANALYSIS
AIM:
To implement a time series analysis by using pandas and matplotlib libraries
ALGORITHM:
Step 1: Import Libraries. Import pandas and matplotlib.pyplot.
Step 2: Create Data: Create a dictionary with date range and corresponding values.
Step 3: Create Data Frame:Convert dictionary to Data Frame.Convert date column to datetime.Set
date column as index.
Step 4: Plot Data: Plot the time series data. Customize plot with labels, title, legend, and grid.
Step 5: Display Plot:Show the plot.
Step 6: Generate Summary Statistics: Print summary statistics of the Data Frame. Print the first few
rows of the Data Frame.
THEORY:
Time series analysis involves methods for analyzing time-ordered data points to extract
meaningful statistics, identify patterns, and forecast future values. It is widely used in
various fields, such as economics, finance, environmental science, and engineering, where
understanding temporal dynamics is crucial.
Components of a Time Series:
• Trend: The long-term progression of the series (upward, downward, or stationary).

• Seasonality: Regular, repeating patterns or cycles in the data (e.g., monthly sales due to
holidays).
• Cyclicality: Irregular, longer-term oscillations (e.g., economic cycles).
• Noise (Residual): Random variation or irregular fluctuations in the data.
Stationarity:
• Definition: A stationary time series has statistical properties, such as mean and variance,
that are constant over time.
• Importance: Many time series models assume stationarity. Non-stationary series often
need to be transformed to stationary (e.g., differencing).
Autocorrelation:
• Definition: The correlation of a time series with its own past values.
• ACF and PACF: Autocorrelation Function (ACF) and Partial Autocorrelation Function
(PACF) help identify the presence of autocorrelation and determine appropriate model
parameters.
PROGRAM:
import pandas as pd import
matplotlib.pyplot as plt data =
'date': pd.date_range(start='2024-01-01', periods=5),
'value': [10, 12, 15, 18, 20]

} df = pd.DataFrame(data) df['date'] =
pd.to_datetime(df['date']) df.set_index('date',
inplace=True) plt.figure(figsize=(10, 6))
plt.plot(df.index, df['value'], label='Time Series Data')
plt.xlabel('Date') plt.ylabel('Value') plt.title('Time
Series Data') plt.legend() plt.grid(True) plt.show()
print("Summary Statistics:")
print(df.describe()) print("\
nOutput:") print(df.head())
OUTPUT:
RESULT:
Thus the implementation of a time series analysis is successfully done.
Ex .No. : 12
REGRESSION
Date:
AIM:
To model and predict the value of the dependent variable based on the values of the independent
variables.
ALGORITHM:
Step1: Start the Program
Step2: Calculate the means of the independent variable(s)
Step3: Calculate the deviations from the means
Step4: Optionally, you can calculate other statistics such as the coefficient of determination Step5:
Stop the Program
THEORY:
Regression is a statistical method used to examine the relationship between two or more variables.
The primary goal is to model the relationship and make predictions.
PROGRAM:
import numpy as np import pandas as pd from
sklearn.model_selection import train_test_split from
sklearn.linear_model import LinearRegression from
sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt np.random.seed(0)

X = 2 * np.random.rand(100, 1) y = 4 + 3 * X +
np.random.randn(100, 1) data = pd.DataFrame(np.hstack([X,
y]), columns=["X", "y"])
_train, X_test, y_train, y_test = train_test_split(data[["X"]], data["y"], test_size=0.2,

random_state=42)
model = LinearRegression() model.fit(X_train, y_train) y_pred =
model.predict(X_test) mse = mean_squared_error(y_test, y_pred) r2
= r2_score(y_test, y_pred) print(f"Mean Squared Error: {mse}")
print(f"R^2 Score: {r2}") plt.scatter(X_test, y_test, color="blue",
label="Actual") plt.plot(X_test, y_pred, color="red", linewidth=2,
label="Predicted") plt.xlabel("X") plt.ylabel("y") plt.title("Linear
Regression") plt.legend() plt.show()
OUTPUT:
RESULT:
Thus the program implemented successfully.

FDS Lab 1 Manuel .1..1new

Uploaded by

Copyright:

Available Formats

FDS Lab 1 Manuel .1..1new

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FDS Lab 1 Manuel .1..1new

Uploaded by

Copyright:

Available Formats

Ex .No.

To implement working with pandas data frame using python program.

Step1: Start the program

Step2: Import numpy and pandas module

Step3: Create a dataframe using the dictionary

Step4: Print the output

Step5: Stop the program

import pandas as pd data =

[["Nila", 25, "India"],

["Chithra", 30, "Singapore"], ["Gowthami", 35, "Paris" ]] df =

pd.DataFrame (data, columns = ["Name", "Age", "City"])

Series: A one-dimensional labeled array, similar to a column in a spreadsheet.

Working with numpy arrays using python.

Step1: Start the program

Step2: Import numpy modules

Step3: Print the basic characteristics and operations of array

Step4: Stop the program

NumPy (Numerical Python) is a powerful library for numerical computations in Python. It

(A) Create a NumPy array of zeros and ones

import numpy as np zeros\_array =

np.zeros((3, 3)) ones\_array = np.ones((2, 4))

print("Zeros Array:") print(zeros\_array)

print("Ones Array:") print(ones\_array)

import numpy as np random\_array =

[[0.22550229 0.75346623 0.18106636] [0.36264152

[0.96555117 0.79383112 0.32111982]]

(C)Perform element-wise addition, subtraction, multiplication, and division

8]]) addition = np.add(arr1, arr2)

subtraction = np.subtract(arr1, arr2)

multiplication = np.multiply(arr1, arr2)

division = np.divide(arr1, arr2)

(D) Transpose a NumPy array import

arr = np.array([[1, 2, 3], [4, 5, 6]])

print("Original Array:") print(arr)

arr = np.array([[1, 2, 3], [4, 5, 6]]) max\_index\_axis0 = np.argmax(arr,

axis=0) min\_index\_axis1 = np.argmin(arr, axis=1) print("Index of

Maximum Element (along axis 0):", max\_index\_axis0) print("Index of

Minimum Element (along axis 1):", min\_index\_axis1)

Index of Maximum Element (along axis 0): [1 1 1]

Index of Minimum Element (along axis 1): [0 0]

Step1: Start the program

Step2: Import Matplotlib Module

Step3: Create a basic plots using matplotlib

Step4: Print the output

Step5: Stop the program

import matplotlib.pyplot as plt

np.linspace(0, 10, 100) y =

categories = ['A', 'B', 'C', 'D']

plt.show() labels = ['A', 'B',

'C', 'D'] sizes = [15, 30, 45,

plt.title('Pie Chart') plt.show()

x = np.linspace(0, 10, 100) y1

label='Sine', color='blue', linestyle='--') plt.plot(x, y2,

label='Cosine', color='red', linestyle='-.')

# Add titles, labels, and legend

plt.title('Sine and Cosine Waves')

axis') plt.legend() # Show plot

plt.title('Sine Wave with Grid')

axis') plt.grid(True) plt.show() #