Lecture2.Python programming boot camp
Lecture2.Python programming boot camp
1. Purpose
(2) It is a good idea to put together the installation of the required packages in a shell
script. For example, if the PowerShell executable file is named “RAGenv.ps1”, the
contents will be as follows:
<#
.SYNOPSIS
RAG environment setup script
.DESCRIPTION
This script installs the Python libraries required for developing the Retrieval-
Augmented Generation (RAG) system.
.NOTES
Author: Kazuo Hashimoto
Created: March 26, 2024
Version: 1.0
#>
Machine Translated by Google
(1) If file paths or personal OPENAI_API_KEYs are written directly into a program, the portability of the program
will be significantly reduced. Therefore, for values that are highly system-dependent, we recommend that
There is a way.
(2) Place the .env file in the execution directory. .env is <environment variable> = <value>
.env example
import os
# Source: https://www.jrce.co.jp/medical/recept/index.html
# ORCA 1 - 7
1. "What is ORCA?" , "ORCA is a Japanese medical information system provided by the Japan Medical Association.
This is a medical receipt computer system for medical facilities.
It is used by 17,000 medical institutions and is the standard system in the age of electronic receipts.
"
2. "What are the main features of ORCA?" , The main features of ORCA are:
1. Always up-to-date (the latest programs are available via the Internet)
2. Reduction in implementation and update costs (hardware only
3. Relatively easy to implement (replace existing receipts with the new system)
(Data transfer is possible)
3. "What kind of medical institutions would you recommend ORCA to?" "The following types of medical institutions" ,
Recommended for medical institutions: New openings and lease expirations, introduction of new receipt systems
Medical institutions considering the above Medical institutions that are dissatisfied with the manufacturer-led medical billing system
Improving the efficiency of medical administration at medical institutions that support the Japan Medical Association's philosophy of IT in the medical field
(ORCACLOUD): ORCA has been put on the cloud, and is now even safer and more convenient.
Ip"
5. When introducing ORCA, is it possible to transfer data from existing receipt computers?
, Yes, it is possible. You can get basic patient information and disease name from the currently used receipt computer.
7. "How will ORCA be updated and how will changes in medical fees be handled?
"
, "You can get the latest programs and drug masters via the Internet.
The changes are reflected automatically with a simple click of a button on the screen.
Similarly, medical fee revision programs can be updated with the click of a button.
There is no cost to you."
# Source:
https://www.jrce.co.jp/medical/product/onpremiseorca.htm
l
# In-hospital server version ORCA 8 - 15
,
8. "What is the in-hospital server version of ORCA?" "The in-hospital server version of ORCA is
It will be protected."
_id
"ORCA is a medical information platform provided by the Japan Medical Association for the medical field in Japan.
What is ORCA? Currently, it is used in approximately 17,000 medical institutions, and electronic receipts are
1 mosquito?" It is growing into the standard system for the computing era.
Machine Translated by Google
to date (the latest programs and masters can be obtained via the Internet).
2. It reduces the cost of introduction and updating (only the hardware needs
Main features of ORCA to be replaced). 3. It is relatively easy to introduce (data can be transferred
considering the introduction of a new medical receipt system due to new openings or lease
"What types of medical expiration Medical institutions that are dissatisfied with manufacturer-driven medical receipt
institutions would you systems Cost-conscious medical institutions that agree with the Japan Medical Association's
3 recommend ORCA for?" philosophy of IT in the medical field and aim to streamline medical institution administration."
"What types of ORCA 2. Cloud version ORCA (ORCACLOUD): A type where ORCA is put on
does JRCE offer?" the cloud, and safety and convenience are further improved.
is it possible to transfer "Yes, it is possible. Patient basic information and disease name information
data from an existing can be transferred from the medical receipt computer currently in use.
receipt computer? However, depending on the manufacturer, data transfer may not be
"
5 possible, so please contact us for details." "It is
"How long does it take generally said that you can learn to use it after just two operating
to learn how to use ORCA? instructions. In addition, JMA IT certified instructors will support smooth
It takes time business operations, and call support is also available." "You can obtain
and master data such as medicines via the Internet, and they will be
"How will you respond automatically reflected by simply clicking a button on the screen. Similarly,
to ORCA updates and medical fee revision programs can be updated by clicking a button, and
medical fee revisions? there is no additional cost." "The in-hospital server version of ORCA is a
"Hospital Server Version backup is standard, and a system configuration that suits the customer's
8 mosquito?"
recommended system on two computers, even if one machine breaks down, the other can
"
Masu."
"How is data security "If you have a single ORCA system, we recommend that you back
ensured in a single up your data after each consultation and save it to a USB
ORCA configuration?" memory stick. If you have a computer problem, a technician will
10 visit you and perform recovery work from the backup data."
Main
functions: 1. Reading FAQ text files 2.
Analyzing FAQ content (extracting FAQ_id, Question_id, Answer_id) 3.
Converting the analyzed data to a
DataFrame 4. Outputting the DataFrame to an Excel file
import pandas as pd
(2) If the document is long, split it into chunks of an appropriate length (e.g. 1000 characters)
using the split_documents function. However, in the following case, it is unlikely that the
strings in the Question and Answer columns will exceed 1000 characters.
Processing using split_documents is not necessary and will degrade performance.
return documents
def load_service_manual(file_path):
loader = PyPDFLoader(file_path) return
loader.load()
(2) In order to prevent the loss of context information due to the division,
Define chunk_overlap. Here, we split the chunk with 20% overlap, but you need to
find the optimal value for the target document.
```python
conda install onnxruntime -c conda-forge
```
See this
[thread](https://github.com/ nvironme/onnxruntime/issues/11037) for nvironmen help if needed.
```python
pip install -r requirements.txt
```
```python
pip install “unstructured[md]”
```
##Create database
```python
python create_database.py
```
```python
python query_data.py “How does Alice meet the Mad Hatter?”
```
> You'll also need to set up an OpenAI account (and set the OpenAI key in your
environment variable) for this to work.
(2) create_database.py
ÿ This is a basic program to create a VoctoreStore using ChromaDB.
(3) query_data.py
ÿ This is a program that queries the database created above. Run it in your own environment and
ÿ In this program, the question is passed as an argument when the program runs.
Modify it so that it accepts input from the user and returns an answer.
(5) Compare_embeddings.py
In compare_embeddings.py, the vector distance between words is calculated as follows, but please modify
it to calculate the vector distance between sentences.
6 RAG System
RAG.Sample.py is the skeleton code for the RAG system. 6.1 Each
function has a basic error handling mechanism built in, using the try <code> except <Error-handling-
code> syntax.
Def get_file_hash(file_path):
"Compute the MD5 hash of a file"
hash_md5 = hashlib.md5 ()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b””):
hash_md5.update(chunk)
return hash_md5.hexdigest()
metadata=metadata)
documents.append(doc)
except Exception as e:
logging.error(f" Failed to load FAQ: {e}") return []
6.4 VectorStore
ÿ Get the last update time of external data (e.g. FAQs) ÿ Compare
with the previous update time stored in an environment variable.
2. Branching based on whether updates exist:
a) If there is an update:
3. Error handling:
ÿ Dealing with errors that occur while reading/writing files or processing external data
Masu.
vector_store =
Chroma(persist_directory=vector_store_path,
embedding_function=embeddings)
else:
Machine Translated by Google
if updated_files:
logging.info(f"The following files have been updated,
Recreate the VectorStore: {', '.join(updated_files)}")
else:
logging.info(" Create a new VectorStore") vector_store
= Chroma.from_documents(documents, embeddings,
persist_directory=vector_store_path)
vector_store.persist()
return vector_store
except Exception as e:
logging.error(f" An error occurred while managing VectorStore:
{e}")
raise
Now define the main process using the functions we have defined so far.
(1) VECTOR_STORE_PATH must be set to the directory where the script file is located.
It seems to be pre-defined, but it might be better to define a fixed path.
If there is a better format definition from the standpoint of usability and debugging, we will improve it.
# Logging
configuration logging.basicConfig(level=logging.INFO, format='%(asctime)s - %
(levelname)s - %(message)s')
template = """Please answer the user's most recent question using the conversation history and information
below. Please
uninformed guesses and if you do not have the information, say "That information is not provided."
Conversation
history: {chat_history}
References:
{context}
answer:"""
PROMPT = PromptTemplate(
input_variables=["chat_history", "context", "question"],
template=template,
)
(5) Main processing script
# Main
processing def main():
try:
Machine Translated by Google
faq_data = load_faq(FAQ_FILE)
inquiry_history =
load_inquiry_history(INQUIRY_HISTORY_FILE)
service_manual = load_service_manual(SERVICE_MANUAL_FILE)
vector_store = manage_vector_store(split_data,
VECTOR_STORE_PATH, openai_api_key, FILE_PATHS)
""Please answer the user's most recent question using the conversation history and
information below. Please
answer concisely in easy-to-understand Japanese. Avoid making
uninformed guesses and if you do not have the information, say "That information is not provided."
Please answer "No, I haven't."
Conversation
history: {chat_history}
References:
{context}
answer:"""
PROMPT = PromptTemplate(
input_variables=["chat_history", "context", "question"],
template=template,
# Initialize memory
memory =
ConversationBufferMemory(memory_key="chat_history",
return_messages=True)
retriever=vector_store.as_retriever(),
memory=memory,
combine_docs_chain_kwargs={"prompt": PROMPT}
)
# Question-answering loop
while True:
question = input("Enter your question (press 'q' to quit)
input): ")
if question.lower() == 'q':
break
try:
result = qa_chain({"question": question}) print("answer:",
result['answer']) except Exception as e:
except FileNotFoundError as e:
logging.error(f"File not found: {e}")
except Exception as e:
logging.error(f"An error occurred while running the program: {e}")