Python Written Assignment
Python Written Assignment
Master's thesis
Submitted by
Subodhini Balu Bhosale
42310022
May, 2024
I
Abstract
This assignment, titled "Analyzing the Impact of Python Libraries on Data Science," delves into the
pivotal role of Python libraries in shaping contemporary data science practices. Beginning with an
overview of Python's ascendancy as a premier language in the field, it underscores the significance of
libraries in augmenting Python's capabilities for data manipulation, analysis, and visualization. The
main body of the assignment elucidates four key Python libraries essential for data science: NumPy,
Pandas, Matplotlib, and scikit-learn. Each section provides a comprehensive examination of the li-
brary's functionalities, elucidating their advantages and applications in real-world scenarios. For in-
stance, the discussion on NumPy delves into its fundamental role in numerical computing, explicating
NumPy arrays, their superiority over conventional Python lists, and showcasing NumPy functions for
array manipulation, mathematical operations, and linear algebra. Similarly, the analysis of Pandas un-
derscores its indispensable role in data manipulation and analysis, introducing Pandas Series and
DataFrame structures, and demonstrating Pandas functions for data cleaning, transformation, filtering,
and aggregation. Moreover, the exploration of Matplotlib highlights the crucial aspect of data visualiza-
tion in data science, introducing Matplotlib's capabilities for crafting various types of plots and charts,
and showcasing its functionalities for visualizing data distributions, trends, and relationships. Through-
out the assignment, reasoned arguments are bolstered by theoretical underpinnings, practical illustra-
tions, and references to scholarly sources, ensuring a thorough and insightful analysis. The conclusion
succinctly summarizes the assignment's key insights, emphasizing the transformative impact of
Python libraries in empowering data scientists to derive actionable insights from intricate datasets and
steer data-driven decision-making across diverse domains.
II
Table of Contents
1.Introduction……………………………………………………………………………….............................1
1.3.2. Pandas…………………………………………………………………………………………………………2
1.3.3. Matplotlib………………………………………………………………………………………………………2
2. Numpy…………………………………………………………………………………………………………3
2.2. Explanation of Numpy Arrays and their Advantages over Traditional Python Lists…………………………..3
2.3. Demonstrating NumPy Functions for Array Manipulation, Mathematical Operations, and Linear Algebra...4
3. Pandas…………………………………………………………………………………………………………8
3.3. Utilizing Pandas Functions for Data Cleaning, Transformation, Filtering, and Aggregation………………….9
4. Matpotlib……………………………………………………………………………………………………..12
4.2. Introduction to Matplotlib for Creating Various Types of Plots and Charts…………………………………...13
4.3. Demonstrating Matplotlib's Functionalities for Visualizing Data Distributions, Trends, and Relationships.13
5. Conclusion…………………………………………………………………………………………………..18
III
5.1 Key points…………………………………………………………………………………………………………...18
5.1.1 Numpy…………………………………………………………………………………………………………18
5.1.2 Pandas………………………………………………………………………………………………………..18
5.1.3 Matplotlib……………………………………………………………………………………………………...18
6. Bibliography……………………………………………………………………………………………………………20
IV
List of Figures
Including Numpy___________________________________________________________________________________- 3 -
Creating Arrays____________________________________________________________________________________- 4 -
Reshaping Array____________________________________________________________________________________- 4 -
Numpy terminology_________________________________________________________________________________- 5 -
Broadcasting in Numpy______________________________________________________________________________- 5 -
Random Number Generation__________________________________________________________________________- 6 -
Data Cleaning______________________________________________________________________________________- 6 -
Including Pandas___________________________________________________________________________________- 7 -
Creating pandas Series_______________________________________________________________________________- 8 -
Creating Dataframes________________________________________________________________________________- 8 -
Apply() function in Pandas____________________________________________________________________________- 9 -
Function groupby()__________________________________________________________________________________- 9 -
Time series analysis_________________________________________________________________________________- 9 -
Practical examples_________________________________________________________________________________- 10 -
Difference between Numpy and Pandas________________________________________________________________- 10 -
Histogram_______________________________________________________________________________________- 11 -
Box Plot_________________________________________________________________________________________- 12 -
Scatter Plot_______________________________________________________________________________________- 13 -
Pie chart_________________________________________________________________________________________- 13 -
Advanced Customization Techniques__________________________________________________________________- 14 -
Result of Advanced Techniques_______________________________________________________________________- 14 -
Result of Advanced Techniques_______________________________________________________________________- 15 -
Animated plots____________________________________________________________________________________- 15 -
Result of Animated plots____________________________________________________________________________- 15 -
V
1.Introduction:
The landscape of data science is currently experiencing a significant shift, largely driven by the wide-
spread adoption of Python and its extensive range of libraries tailored for data analysis and manipula-
tion. In this era of unprecedented data abundance, organizations across various sectors are increas-
ingly relying on Python libraries to derive actionable insights from complex datasets. As the volume,
velocity, and variety of data continue to grow, the need for robust analytical tools has never been more
pressing.
This assignment aims to systematically explore the profound impact of Python libraries on data sci-
ence, elucidating their significance and implications within the context of contemporary data-driven en-
deavors. Recent studies and scholarly discourse underscore the pivotal role of Python libraries in
shaping the data science landscape. From small-scale startups to multinational corporations, Python
has emerged as the language of choice for data professionals seeking to extract value from their data
assets.
As organizations grapple with the challenges posed by burgeoning datasets and evolving analytical
techniques, the relevance of Python libraries becomes increasingly pronounced. By examining the
open questions surrounding the efficacy, limitations, and future directions of Python libraries in data
science, we aim to shed light on their transformative potential and pave the way for informed decision-
making.
The aim of this assignment is to analyze and evaluate the multifaceted impact of Python libraries on
data science practices. By delineating the boundaries of our inquiry and defining key terms, we pro -
vide readers with a comprehensive understanding of the parameters within which our analysis oper-
ates. Our objective is not only to elucidate the capabilities and limitations of Python libraries but also to
explore their broader implications for data science methodologies and workflows. Through rigorous in-
quiry and critical analysis, we endeavor to contribute to the ongoing discourse surrounding Python's
role in shaping the future of data-driven decision-making.
Python's popularity in data science can be attributed to its user-friendly design and broad applicability.
Its clean and readable syntax lowers the barrier to entry, enabling individuals from diverse back-
grounds to quickly grasp the fundamentals of programming. Moreover, Python's dynamic typing and
high-level abstractions facilitate rapid prototyping and experimentation, fostering a culture of innova-
tion within the data science community.
Beyond its syntactic elegance, Python's versatility extends to its ecosystem of libraries, which serve as
the lifeblood of data science workflows. These libraries augment Python's core functionality, providing
1
specialized tools for tasks ranging from data manipulation to machine learning. As a result, Python has
become the language of choice for data professionals seeking to extract actionable insights from com-
plex datasets.
While Python's core language features are robust, its true power lies in its extensive collection of third-
party libraries. These libraries, developed and maintained by a vibrant community of contributors, ex-
tend Python's capabilities in myriad ways, empowering data scientists to tackle real-world challenges
with confidence.
In the context of data science, libraries play a pivotal role in accelerating workflows and facilitating re -
producible research. By abstracting complex operations into simple function calls, libraries such as
NumPy, Pandas, and Matplotlib enable data scientists to focus on high-level analysis rather than low-
level implementation details. This abstraction layer promotes code readability and maintainability, facil-
itating collaboration and knowledge sharing within interdisciplinary teams.
In summary, Python libraries serve as force multipliers, empowering data scientists to tackle complex
analytical challenges with efficiency and ease. In the subsequent sections, we will delve deeper into
the functionalities, applications, and impact of these libraries in data science workflows, unraveling the
intricacies of Python's role in shaping the future of data-driven decision-making.
2
2. NumPy:
2.1 Overview of NumPy and its Role in Numerical Computing:
NumPy, short for Numerical Python, stands as a cornerstone of numerical computing in the Python
ecosystem. Developed to address the shortcomings of traditional Python lists in handling numerical
data, NumPy provides a powerful framework for performing array-based computations efficiently. Its
array-oriented computing capabilities make it indispensable for a wide range of scientific and engi-
neering applications, including data analysis, machine learning, and simulations.
At the heart of NumPy lies its array object, ndarray, which enables efficient storage and manipulation
of homogeneous data. Unlike Python lists, NumPy arrays are homogeneous and contiguous blocks of
memory, allowing for vectorized operations and efficient memory management. This design choice not
only enhances computational performance but also facilitates interoperability with other libraries writ-
ten in low-level languages such as C and Fortran.
2.2 Explanation of NumPy Arrays and their Advantages over Traditional Python Lists:
NumPy arrays offer several advantages over traditional Python lists, making them the preferred data
structure for numerical computing tasks. Firstly, NumPy arrays are homogeneous, meaning that all el-
ements within an array must be of the same data type. This enforced homogeneity enables NumPy to
leverage optimized, low-level routines for array manipulation and arithmetic operations, resulting in
significant performance gains.
Moreover, NumPy arrays are stored in contiguous blocks of memory, allowing for efficient memory ac-
cess and vectorized operations. This contiguous memory layout enables NumPy to perform array
computations in a highly parallelized manner, leveraging the computational power of modern CPUs
and GPUs.
Including Numpy
NumPy is imported as 'np' in this notebook for brevity, using the standard Python convention 'import
numpy as np'.
Another key advantage of NumPy arrays is their support for multidimensional data. While Python lists
are limited to one-dimensional arrays, NumPy arrays can have any number of dimensions, making
them suitable for representing complex data structures such as matrices, tensors, and images. This
multidimensional capability facilitates the manipulation of multidimensional datasets, enabling data sci-
3
entists to work with data in its native form without the need for cumbersome reshaping or transposing
operations.
2.3. Demonstrating NumPy Functions for Array Manipulation, Mathematical Operations, and
Linear Algebra:
NumPy provides a rich set of functions and methods for array manipulation, mathematical operations,
and linear algebra. These functions enable data scientists to perform a wide range of tasks, from basic
array manipulation to advanced numerical computations.
Creating Arrays
NumPy provides functions for creating arrays of various shapes and sizes, initializing arrays with pre-
defined values, and reshaping arrays to suit specific requirements. Additionally, NumPy offers a
plethora of mathematical functions for performing element-wise operations such as addition, subtrac-
tion, multiplication, and division.
a. Creating array
b. Reshaping the existing ar-
ray and labeling with new
name.
c. Printing the new array.
Reshaping Array
4
Furthermore, NumPy boasts a comprehensive suite of linear algebra functions for performing common
operations such as matrix multiplication, matrix inversion, and eigenvalue decomposition. These func-
tions enable data scientists to tackle complex mathematical problems with ease, making NumPy a ver-
satile tool for numerical computing tasks.
Numpy terminology
Broadcasting is a powerful mechanism that allows NumPy to perform arithmetic operations on arrays
of different shapes. It automatically expands the smaller array to match the shape of the larger array,
enabling element-wise operations without the need for explicit looping.
Example:
Broadcasting in Numpy
5
2.5 NumPy for Random Number Generation:
NumPy includes a robust suite of functions for generating random numbers, which are essential for
simulations, statistical modeling, and machine learning tasks.
Example:
NumPy is widely used in data preprocessing and cleaning tasks, such as handling missing values,
normalizing data, and transforming data types.
Example:
Data Cleaning
In summary, NumPy serves as a powerful toolkit for numerical computing in Python, offering efficient
array-based data structures and a wide range of functions for array manipulation, mathematical opera-
tions, and linear algebra. Its capabilities are essential for many scientific and engineering applications,
6
providing the foundation for data analysis, machine learning, and more. With a strong understanding
of NumPy, we are now ready to explore the pandas library, which builds on NumPy to provide even
more powerful data manipulation and analysis tools.
3. Pandas:
7
3.1 Introduction to Pandas for Data Manipulation and Analysis:
In the dynamic realm of data science, Pandas emerges as an indispensable tool, serving as the
bedrock for data manipulation and analysis in Python. Its widespread adoption and robust functionality
make it the cornerstone of countless data-driven projects, facilitating the exploration, transformation,
and analysis of diverse datasets. As we embark on this journey to explore Pandas comprehensively,
we delve into its multifaceted capabilities, illuminating its pivotal role in empowering data scientists,
analysts, and researchers to extract actionable insights from complex data.
Including Pandas
Pandas is imported as 'pd' using the standard Python convention 'import pandas as pd'.
At the nucleus of Pandas lie two foundational data structures: Series and DataFrame. The Pandas Se-
ries represents a one-dimensional array-like object, equipped with labels or indices for efficient data
access. It encapsulates a single column of data, enabling users to manipulate and analyze data with
granularity and precision. On the other hand, the Pandas DataFrame extends this functionality to a
two-dimensional tabular structure, akin to a spreadsheet or database table. With rows and columns la-
beled for easy identification, DataFrames offer a structured framework for organizing, exploring, and
visualizing data of varying dimensions and complexities.
The versatility of Pandas Series and DataFrame data structures is underscored by their ability to ac -
commodate heterogeneous data types seamlessly. Whether dealing with numerical measurements,
8
categorical variables, or textual descriptions, Pandas provides a unified interface for handling diverse
data formats. This inherent flexibility empowers users to perform a myriad of operations, from data ag-
gregation and summarization to advanced statistical analysis and machine learning modeling.
Creating Dataframes
3.3 Utilizing Pandas Functions for Data Cleaning, Transformation, Filtering, and Aggregation:
Pandas empowers users with a vast array of functions and methods for data cleaning, transformation,
filtering, and aggregation, facilitating the construction of robust data pipelines. These functions serve
as building blocks for preprocessing raw data, ensuring its quality, integrity, and consistency before
analysis.
For instance, Pandas offers a suite of functions for handling missing data, including isnull(), dropna(),
and fillna(), enabling users to address data incompleteness effectively. Moreover, Pandas facilitates
data transformation through functions like map(), apply(), and groupby(), allowing users to apply cus-
tom functions to data elements, group data by specific criteria, and compute aggregate statistics with
ease.
Furthermore, Pandas provides robust support for data filtering and selection, offering intuitive indexing
and slicing mechanisms. Whether extracting subsets of data based on conditional criteria or selecting
specific columns for analysis, Pandas' expressive syntax streamlines the process of data extraction
and exploration.
Pandas excels in handling time series data, providing tools for datetime manipulation, resampling, and
rolling windows.
Examples:
Providing practical examples and use cases can help illustrate how pandas is used in real-world sce-
narios, such as data cleaning, financial analysis, and machine learning preprocessing.
Examples:
10
Practical examples
In summary, pandas serves as a comprehensive and versatile library for data manipulation and analy-
sis, providing robust data structures and a wealth of functions for cleaning, transforming, and visualiz-
ing data. Its integration with other libraries and powerful features make it an indispensable tool in the
data science workflow. Next, we will explore Matplotlib, delving into how it further enhances our ability
to analyze and interpret data.
11
4. Matpotlib:
4.1 Importance of Data Visualization in Data Science:
Data visualization stands as an indispensable component of the data science toolkit, serving as a
bridge between raw data and actionable insights. In today's data-driven world, the ability to effectively
communicate complex patterns, trends, and relationships through visual representations is crucial for
driving informed decision-making and achieving organizational objectives. By harnessing the power of
data visualization, data scientists can uncover hidden patterns, identify outliers, and communicate
their findings to stakeholders in a clear and intuitive manner.
Data visualization plays a multifaceted role across various stages of the data analysis lifecycle. During
exploratory data analysis (EDA), visualizations serve as a lens through which analysts can gain initial
insights into the underlying structure of the data. From identifying data distributions and correlations to
detecting anomalies and outliers, visualizations provide a comprehensive overview of the dataset,
guiding subsequent analysis and hypothesis generation.
Moreover, in the model development phase, data visualization enables data scientists to evaluate
model performance, assess the validity of assumptions, and identify areas for improvement. By visual-
izing model predictions against actual outcomes, analysts can gain insights into the model's predictive
capabilities and identify instances where the model may be underperforming or overfitting the data.
Furthermore, in the presentation of findings and insights, data visualization plays a crucial role in con-
veying complex analytical results to non-technical stakeholders. Through visually compelling charts,
graphs, and dashboards, data scientists can distill key insights from the data and communicate them
in a digestible format, empowering stakeholders to make informed decisions and take appropriate ac-
tions.
Histogram
4.2 Introduction to Matplotlib for Creating Various Types of Plots and Charts:
12
Matplotlib, a cornerstone of the Python data visualization ecosystem, offers a comprehensive suite of
tools for creating a wide range of plots and charts. With its intuitive interface and powerful customiza-
tion options, Matplotlib empowers users to generate static, animated, and interactive visualizations tai-
lored to their specific analytical needs.
At its core, Matplotlib provides a high-level interface for creating basic plots, such as line plots, scatter
plots, bar charts, histograms, and pie charts. These fundamental plot types serve as building blocks
for more complex visualizations, allowing users to explore relationships, distributions, and trends
within their datasets.
In addition to basic plot types, Matplotlib offers support for advanced visualizations, including 3D plots,
geographic maps, and statistical plots. Through its integration with other Python libraries, such as
NumPy and Pandas, Matplotlib enables seamless data integration and visualization, facilitating the ex-
ploration of multidimensional datasets and complex relationships.
4.3 Demonstrating Matplotlib's Functionalities for Visualizing Data Distributions, Trends, and
Relationships:
Matplotlib's versatility shines through its ability to visualize data distributions, trends, and relationships
across diverse domains. Whether analyzing financial data, social networks, or scientific measure-
ments, Matplotlib provides a rich set of functionalities for exploring and interpreting complex datasets.
For instance, Matplotlib's plt.plot() function enables users to create line plots, ideal for visualizing
trends and patterns over time or across different variables. By plotting data points connected by lines,
users can identify temporal trends, cyclical patterns, and long-term relationships within the data.
Box Plot
Moreover, Matplotlib offers support for creating histograms (plt.hist()), bar charts (plt.bar()), and box
plots (plt.boxplot()), among other types of plots, enabling users to analyze data distributions, compare
categorical variables, and identify potential outliers or anomalies. These visualization techniques are
instrumental in uncovering hidden patterns, assessing
data quality, and deriving actionable insights from the
data.
Furthermore, Matplotlib's extensive customization options allow users to fine-tune the appearance and
aesthetics of their plots, including colors, markers, line styles, axis labels, titles, and annotations. By
customizing the visual elements of their plots, users can create visually appealing and informative vi-
sualizations that effectively convey key insights and findings to diverse audiences.
Pie charts visually represent proportions of a whole, useful for illustrating simple distributions. How-
ever, they can be misleading when comparing values or categories and are not suitable for complex
datasets, leading to misinterpretations.
Pie chart
Discussing advanced customization techniques can help users create more polished and publication-
quality plots.
Examples:
14
Advanced Customization Techniques
Result:
Example:
Animated plots
Result:
In summary, Matplotlib stands as a cornerstone of data visualization in Python, offering a rich array of
plotting functions and customization options to visualize data distributions, trends, and relationships ef-
fectively. Whether through simple plots or complex visualizations, Matplotlib empowers data scientists
to communicate their insights clearly and effectively. By leveraging its extensive capabilities, data sci-
entists can transform raw data into meaningful visual representations that drive informed decision-
making and actionable outcomes.
16
5. Conclusion:
Throughout this assignment, we have delved into the significant impact of Python libraries, notably
NumPy, Pandas, and Matplotlib, on data science. These libraries are essential for data manipulation,
analysis, and visualization, enabling data scientists to extract actionable insights from complex
datasets and make informed decisions.
In drawing these arguments to a close, it is evident that Python libraries play a pivotal role in advanc -
ing data science. They offer robust frameworks for managing and analyzing data, thereby driving inno-
vation and informed decision-making.
In conclusion, by harnessing the power of NumPy, Pandas, Matplotlib, and other Python libraries, data
scientists can effectively navigate the challenges of modern data analysis, fostering innovation and im-
pact across diverse fields.
18
Bibliography
1. Oliphant, T. E. (2006). A Guide to NumPy. USA: Trelgol Publishing.
2. McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proceedings of the
9th Python in Science Conference.
3. Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engi-
neering, 9(3), 90-95.
4. Wes McKinney (2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and
IPython. O'Reilly Media, Inc.
5. Van der Walt, S., Colbert, S. C., & Varoquaux, G. (2011). The NumPy Array: A Structure for
Efficient Numerical Computation. Computing in Science & Engineering, 13(2), 22-30.
6. Fabian Pedregosa, Gaël V., Alexandre G., Vincent M.l, Bertrand T., Olivier Grisel, Math-
ieu B., Peter P., Ron W., Vincent D., Jake Vanderplas, Alexandre Passos, David Courna-
19
peau, Matthieu Brucher, Matthieu Perrot, & Édouard D. (2011). Scikit-learn: Machine
Learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
7. Pauli V., Ralf G., Travis E. O., Matt H., Tyler R., David C., Evgeni B., Pearu P., Warren W.,
Jonathan B., Stéfan J. van der Walt, Matthew B., Joshua W., K. Jarrod M., Nikolay M.,
Andrew R. J. Nelson, Eric J., Robert K., Eric L., C J Carey, İlhan P., Yu Feng, Eric W.
Moore, Jake V., Denis L., Josef P., Robert C., Ian H., E. A. Quintero, Charles R Harris,
Anne M. Archibald, Antônio H Ribeiro, Fabian P., Paul van M., & SciPy 1.0 Contributors
(2020). SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Meth-
ods, 17(3), 261-272.
8. Lars B., Gilles L., Mathieu B., Fabian P., Andreas M., Olivier G., Vlad N., Peter P., Alexan -
dre G., Jaques G., Robert L., Jake V., Arnaud J., Brian H., & Gaël V. (2013). API Design for
Machine Learning Software: Experiences from the Scikit-learn Project. arXiv preprint
arXiv:1309.0238.
9. VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with
Data. O'Reilly Media, Inc.
10. Waskom, M. L. (2021). Seaborn: Statistical Data Visualization. Journal of Open Source Soft-
ware, 6(60), 3021.
11. McKinney, W. (2013). Pandas: A Foundational Python Library for Data Analysis and Statistics.
PyHPC: Python in High Performance Computing, 1-9.
12. Feng, J., & Lipton, Z. C. (2018). Deep Learning for Finance: Deep Portfolios. Applied Sto-
chastic Models in Business and Industry, 34(1), 120-129.
import numpy as np
import pandas as pd
20
def load_data_from_csv(file_path):
"""
Args:
Returns:
"""
df = pd.read_csv(file_path)
return df
"""
Args:
"""
metadata = MetaData()
metadata.create_all(engine)
"""
Args:
"""
metadata = MetaData()
metadata.reflect(bind=engine)
table = metadata.tables[table_name]
ins = table.insert().values(row.to_dict())
session.execute(ins)
session.commit()
"""
22
Load data from a database table into a pandas DataFrame.
Args:
Returns:
"""
df = pd.read_sql(query, engine)
return df
"""
Args:
Returns:
dict: A dictionary mapping each training function to its best corresponding ideal
function.
"""
best_ideal_funcs = {}
min_ssr = float('inf')
best_func = None
23
ssr = np.sum((training_df[train_col] - ideal_df[ideal_col]) ** 2)
min_ssr = ssr
best_func = ideal_col
best_ideal_funcs[train_col] = best_func
return best_ideal_funcs
"""
Args:
best_ideal_funcs (dict): Dictionary mapping each training function to its best ideal
function.
Returns:
dict: A dictionary containing the residuals for each test data point.
"""
residuals = {}
x_val = test_row['x']
y_test = test_row['y']
closest_ideal = None
min_residual = float('inf')
24
if residual < min_residual:
min_residual = residual
closest_ideal = ideal_col
return residuals
class TestFunctions(unittest.TestCase):
def test_load_data_from_csv(self):
test_df = load_data_from_csv('test.csv')
self.assertEqual(len(test_df), 10)
def test_find_best_ideal_functions(self):
self.assertEqual(len(best_ideal_funcs), len(training_df.columns) - 1)
def visualize_training_data(training_df):
"""
Args:
"""
25
output_file("training_data.html")
p.legend.click_policy="hide"
save(p)
"""
Args:
residuals (dict): Dictionary containing residuals for each test data point.
"""
output_file("test_data.html")
p.legend.click_policy = "hide"
save(p)
class DataVisualizer:
@staticmethod
26
def plot_data(training_df, ideal_df, test_df, residuals, best_ideal_funcs):
"""
Plot training data, ideal functions, test data, and residuals using Bokeh.
Args:
residuals (dict): Dictionary containing residuals for each test data point.
best_ideal_funcs (dict): Dictionary mapping each training function to its best ideal
function.
"""
show(column(p, p2))
def main():
27
"""
Main function to orchestrate the data loading, analysis, and visualization process.
"""
# Database setup
engine = create_engine(r'sqlite:///C:/Users/ECS/Desktop/assignment/database.db')
Session = sessionmaker(bind=engine)
session = Session()
print(best_ideal_funcs)
visualize_training_data(training_df)
28
# Visualize test data
if __name__ == '__main__':
main()
29
4. Push Changes to Remote Branch
After pushing the changes, go to the repository on GitHub and create a pull request from the develop
branch to the main branch. Provide a title and description, then submit it for review.
Once the pull request is reviewed and approved, it can be merged into the main branch via the GitHub
web interface.
30