Crit C Development
Crit C Development
Contents
Dependencies ......................................................................................................................................................2
ProjectMonitor:.............................................................................................................................................................2
Project Manager: ..........................................................................................................................................................2
Summary of Techniques Used ...............................................................................................................................2
File and Directory manipulation through the python os and shutil libraries ................................................................2
ProjectManager (Front-end) .................................................................................................................................2
Symmetric Key Encryption (“Encrypt and Decrypt Files Using Python”) .......................................................................2
File and Directory manipulation through the python os and shutil libraries ................................................................3
CSV Manipulation .........................................................................................................................................................4
ProjectMonitor (Back-End) ...................................................................................................................................7
Threading (Threading — Thread-Based Parallelism — Python 3.10.8 Documentation) ..............................................7
Cosine Similarity and Natural Language Processing (Cosine Similarity - Text Similarity Metric - Machine Learning
Tutorials) .......................................................................................................................................................................9
The HoursWorked function ...........................................................................................................................................9
pandas DataFrame .....................................................................................................................................................10
Data Visualization using seaborn and MatPlotLib(Waskom) .....................................................................................11
Converting the HTML Receipt template to PDF using pdfkit.......................................................................................13
References ......................................................................................................................................................... 15
1
Dependencies
ProjectMonitor:
1. pandas – An external python library to process data in a manner that is useable for visualization(McKinney)
2. seaborn – An external python library to produce data visualization, such as graphs, when visualizing pro-
cessed data (Waskom)
3. spaCy – an external python library that allowed the comparison of the similarity of two text files using cosine
similarity and natural language processing
4. pyca/cryptography - An external python library that allows the secure decryption and encryption of stored
files
5. threading – A python library that allows threading, which is beneficial to efficiency and process timing
6. pdfkit – Convert filled HTML templates into formatted PDFs
Project Manager:
7. pyca/cryptography - An external python library that allows the secure decryption and encryption of the pro-
jects.csv file
8. pandas
9. cryptography
Using this selection of techniques, 11/11 of the success criteria are addressed sufficiently, with the efficacy of each
dependent on client review.
ProjectManager (Front-end)
Symmetric Key Encryption (“Encrypt and Decrypt Files Using Python”)
Since ProjectManager runs by user driven instances and ProjectMonitor runs indefinitely a new encryption key for
projects.csv cannot be directly shared between both applications. As a result, both have a randomized shared key
saved as a hard-coded variable. The key is still relatively secure since the programs will be run as an executable and
will not produce any comprehensible language upon reverse engineering. This adheres to success criterion 4. An
improvement can be made here by regularly updating this encryption key or using a new one every instance or using
asymmetric key encryption after the same improvement.
2
Sample projects.csv before encryption, where <ProjectCWD> is representative of the project current working
directory:
Sample projects.csv after Fernet encryption (has been compressed into several lines to fit on the page):
File and Directory manipulation through the python os and shutil libraries
The os library in python was used to produce and maintain new local directories in the program installation directory
to store the previous versions of the project files and updated visualizations. This aided with organization and error
prevention, and thereby appealing to success criteria 3, 11 and 12.
3
CSV Manipulation
Using a CSV was sufficient for this application as it is compatible with many of the complex techniques I needed to
use. Further, they proved easy to manipulate and use without any external libraries. This adheres to success criteria 1
and 2 since the maximal resources are conserved. Also, most database solutions would require an app be always
running in the foreground, resulting in both decreased performance and direct influences on Ms. Jaya’s working
environment. With just these two lines of code, any project can be stored in the projects.csv file. Due to the
existence of only a singular object per project, the usage of classes was not required for this same application.
4
Code for the CreateProject function, which uses all 3 of the above techniques
5
Copied Directory
6
ProjectMonitor (Back-End)
Threading (Threading — Thread-Based Parallelism — Python 3.10.8 Documentation)
Both the WorkedHours and ProjectComplete functions needed to be run parallel to each other and be computed for
different files in a short period of time. However, the time frame for the former is 15 minutes, and the latter 5.
Python’s standard serial process is therefore not sufficient. This brings 3 options: Threading, Multiprocessing and
Asynchronous functions. Ultimately, the python implementation of Threading seemed far more efficient for such few
concurrent subprocesses, and it was unnecessary to increase the complexity to the extent of either other option. The
application of this technique help succeeds in appealing to the success criteria 1,2 and 3.
As the effects of threading within this application are difficult to show, a sample application replicating it’s the
technique on a simpler and smaller scale is shown below.
The results of this algorithm show alternating output between two different functions simply due to the threading
module.
7
In standard python, the first function would have first been run to completion, and then the second. This is shown
below.
8
Cosine Similarity and Natural Language Processing (Cosine Similarity - Text Similarity Metric - Machine Learning
Tutorials)
This application uses this technique to efficiently provide the difference between the previous version and current
version of a project file in a manner that is not affected by the size or type of the document, addressing success
criteria 2, 5, 6, and 11. The code snippet below shows the implementation of this technique in the WorkedHours
function using the python spacy library.
9
pandas DataFrames
The pandas DataFrame is a versatile size-mutable two dimensional tabular data structure. The pandas library makes
it so that any processing related to this data structure is extremely well optimized for performance. Further, it also
has high compatibility with the CSV format and seaborn. This technique was used in both applications, but a greater
extent of its capabilities is used in ProjectMonitor. This allows the application to run more efficiently, and therefore
appeases success criterion 2.
10
Data Visualization using seaborn and MatPlotLib(Waskom)
Despite the variety of visualization libraries that could be chosen such as MatPlotLib, I chose seaborn because it was
highly compatible with the DataFrame data structure above. The library also provides many inbuilt options for the
produced visualizations. Below is a snippet from the function VisualizePastHours that details the application of this
library in this application. The sample bar plot in criterion B was produced from inputting random, normal data into
the seaborn_barplot function. This helped fulfil success criterion 7. The bar chart resultant from sample data was the
same as the one present in criterion B.
11
Saved in TestProj directory
12
Converting the HTML Receipt template to PDF using pdfkit
Due to the restrictions on formatting provided by text files, HTML files were chosen to be the receipt template
format. The pdfkit library seemed to be the most efficient PDF converting library in terms of code efficiency length.
This technique addresses success criteria 8.
13
Location of receipt in project directory:
14
Word Count: 1181
References
Cosine Similarity - Text Similarity Metric - Machine Learning Tutorials. https://studymachinelearning.com/cosine-
“Encrypt and Decrypt Files Using Python.” GeeksforGeeks, 13 Jan. 2021, https://www.geeksforgeeks.org/encrypt-
and-decrypt-files-using-python/.
Waskom, Michael. “Seaborn: Statistical Data Visualization.” Journal of Open Source Software, vol. 6, no. 60, Apr. 2021,
15