Lab 0 - Environment Setup
Lab 0 - Environment Setup
1. Go to this link
2. This will launch a Strigo page where you might have to create an account if you’re new
to Strigo.
3. Once you’ve created an account, enter token: Y33M and click on Enter the classroom
to launch the lab environment.
4. This will begin the lab creation process which may take a few moments. You should now
be presented with a screen as shown below. You might also be asked by your browser to
allow app.strigo.io to use your microphone and camera, you can allow for now. Click on
the highlighted icon as shown below to start the Lab
5. Once the lab is loaded, you’ll be greeted with a command prompt. The lab already has
Elastic installed. We will be starting our Elastic cluster. Type “start_elastic” at the
command prompt. This starts the Elasticsearch cluster and should take about 5-7
minutes to come up
6. A process will begin to create docker containers for the required Elastic cluster nodes.
When finished, the result should look like below…
2. Now we will need to change to our “Elastic” tab in Strigo. We might also have to “Reload
this view”…
3. Once the page loads you should see a login prompt. Use “elastic” for the user and paste
the password from the terminal window in the password field.
4. Once logged in you’ll be greeted by a message about adding integrations. We will be
skipping this step as we do not need any integrations for this lab. Click on “Explore on
my own”.
9. Near the top of the screen select “Pipelines”. Pipelines in the context are referring to
inference pipelines. These are different from the ingest pipelines that process data prior
to it being indexed. Inference pipelines become part of ingest pipelines.
10. Click on “Copy and customize” under Ingest Pipelines.
11. Click on “Add Inference Pipeline” in the Machine Learning Inference Pipelines box.
12. Enter “title-vector” for the name. Then select the “Dense Text Embedding” model which
came preloaded into your cluster. Elastic allows for the import and inclusion of multiple
transformer models for different use cases. At the bottom, click “continue”.
13. On the next screen, enter “title” for the Source Field. Leave the Target Field blank and
then click continue at the bottom. Here we are telling the transformer model which field
we want to apply the vectorization to.
14. Click “Continue” again to pass the option test of the model, then click “Create Pipeline”.
15. Now that the pipeline is created, we need to make an adjustment to the vector
dimensions. On the left hand menu select “Dev Tools” in the Management section.
16. Paste the below code into the console to tell Elastic that we’re going to use 768
dimensions. We could increase this to 2048, however that would incur additional
resource cost during ingest processing…
Unset
POST search-elastic-docs/_mapping
{
"properties": {
"title-vector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "dot_product"
}
}
}
17. Check for the following response on the right side of the screen…
Unset
{
"acknowledged": true
}
Now we need to add an additional pipeline to compare vectorization with Elastic’s ELSER
model.
18. Navigate back to Enterprise Search
19. Click on “Indices” under overview
20. In the list of indices click on “search-elastic-docs”. Notice that we entered “elastic-docs”
for the cluster name earlier, however, we’re referencing it by “search-elastic-docs” here.
This because we preface search indexes with “search”.
23. On this screen we will enter similar information as before with a few adjustments. Let’s
start with choosing “New Pipeline” and then setting the name to “title-elser”. Under
models we’ll choose “Elser Text Expansion”. Then click “Continue” at the bottom of the
page.
24. On the next screen we’ll add a mapping. In the list of source fields select “title”. Then
click “Add” to the right. Notice that the target field is automatically named. At the bottom
click continue.
25. At the bottom click “Continue”. We’ll skip testing the model for now so click “Continue
again.
26. On the review page click “Create Pipeline”.
27. Now let's configure the crawler to capture the Elastic documentation.
On the navigation menu to the left, select Enterprise Search -> Overview
Disallow Regex .*
The rules should look like this, note the order of the rules…
Note: If you need to reorder the rules click on the “=” sign and drag up or down until correct.
34. Now scroll to the top of the page and click on the blue button titled “Crawl” then select
“Crawl all domains on this index”
The Crawl button will start spinning and this will take some time to complete.
Lab 1 is complete.
Lab 2 - Setting Up the Web Front End
Connecting to OpenAI
While we wait for the crawler, we’ll set up the frontend for sending queries to Elastic.
Launching Streamlit
Now we will use an app called Streamlit to run a web based frontend to submit our queries to
Elastic and ChatGPT.
3. To start the frontend application that will give us a webpage to interact with Elastic and
ChatGPT, run the following command…
Unset
streamlit run elasticdocs_gpt.py
5. Paste the URL into a new browser tab. Your URL should look similar to this (if you’re on
a VPN or corporate network this port might be blocked.)…
6. Press Enter and a page like below should appear. We’ll fill in the information for this
page during the next lab…
Lab 2 is complete.
Lab 3 - Ask Questions
1. Using the page we loaded from lab 2, enter the API key for OpenAI.
(The other fields under the “Click here to enter Elasticsearch cluster connectivity
information” expansion can be ignored and are included for those that want to run this
environment outside of this workshop.)
2. Below the inputs you’ll see a dropdown box that allows you to select from different
OpenAI models. Feel free to choose whichever you like. Keep in mind that the models
that allow for more tokens will give better answers. However, they also charge more.
(Also note that depending on your account type and status, some models may be
unavailable.)
3. In the prompt within the browser window, enter the question: “What is ELSER?”
The response should look like this…
4. Another question to ask would be: “Generate an elasticsearch query to search for
cows in index cow-logs.” This will respond with a properly formatted query to
search the index for cows.
5. Lastly we can try to ask a random question: “How do I build a boat?”
Due to the focused data we’re using, ChatGPT is unable to answer.