0% found this document useful (0 votes)
6 views28 pages

Lab 0 - Environment Setup

The document outlines the steps for setting up a lab environment using Strigo, including creating an account, launching the lab, and starting an Elasticsearch cluster. It details the process for setting up a web crawler to index Elastic documentation and configuring inference pipelines for vectorization. Additionally, it provides instructions for connecting to OpenAI and launching a web frontend using Streamlit to interact with Elastic and ChatGPT.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views28 pages

Lab 0 - Environment Setup

The document outlines the steps for setting up a lab environment using Strigo, including creating an account, launching the lab, and starting an Elasticsearch cluster. It details the process for setting up a web crawler to index Elastic documentation and configuring inference pipelines for vectorization. Additionally, it provides instructions for connecting to OpenAI and launching a web frontend using Streamlit to interact with Elastic and ChatGPT.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Lab 0 - Environment Setup

1. Go to this link
2. This will launch a Strigo page where you might have to create an account if you’re new
to Strigo.

3. Once you’ve created an account, enter token: Y33M and click on Enter the classroom
to launch the lab environment.
4. This will begin the lab creation process which may take a few moments. You should now
be presented with a screen as shown below. You might also be asked by your browser to
allow app.strigo.io to use your microphone and camera, you can allow for now. Click on
the highlighted icon as shown below to start the Lab
5. Once the lab is loaded, you’ll be greeted with a command prompt. The lab already has
Elastic installed. We will be starting our Elastic cluster. Type “start_elastic” at the
command prompt. This starts the Elasticsearch cluster and should take about 5-7
minutes to come up
6. A process will begin to create docker containers for the required Elastic cluster nodes.
When finished, the result should look like below…

Congratulations, you are now ready for Lab 1!


Lab 1 - Setup the Web Crawler:
1. To get started, navigate to your Strigo lab in your browser. At the command prompt run
the script “get_elastic_password”. This will show the generated password for your Elastic
cluster.

2. Now we will need to change to our “Elastic” tab in Strigo. We might also have to “Reload
this view”…
3. Once the page loads you should see a login prompt. Use “elastic” for the user and paste
the password from the terminal window in the password field.
4. Once logged in you’ll be greeted by a message about adding integrations. We will be
skipping this step as we do not need any integrations for this lab. Click on “Explore on
my own”.

5. On the resulting page, click on “Enterprise Search”


6. In the middle of the next page click “Create an Elasticsearch Index”.
7. On the next screen choose “Use a Web Crawler”.

8. Be sure to name it “elastic-docs”, then click “Create Index”. Naming it elastic-docs is


important because the code for the next lab will reference this index by name.

9. Near the top of the screen select “Pipelines”. Pipelines in the context are referring to
inference pipelines. These are different from the ingest pipelines that process data prior
to it being indexed. Inference pipelines become part of ingest pipelines.
10. Click on “Copy and customize” under Ingest Pipelines.
11. Click on “Add Inference Pipeline” in the Machine Learning Inference Pipelines box.

12. Enter “title-vector” for the name. Then select the “Dense Text Embedding” model which
came preloaded into your cluster. Elastic allows for the import and inclusion of multiple
transformer models for different use cases. At the bottom, click “continue”.
13. On the next screen, enter “title” for the Source Field. Leave the Target Field blank and
then click continue at the bottom. Here we are telling the transformer model which field
we want to apply the vectorization to.
14. Click “Continue” again to pass the option test of the model, then click “Create Pipeline”.

15. Now that the pipeline is created, we need to make an adjustment to the vector
dimensions. On the left hand menu select “Dev Tools” in the Management section.
16. Paste the below code into the console to tell Elastic that we’re going to use 768
dimensions. We could increase this to 2048, however that would incur additional
resource cost during ingest processing…

Unset

POST search-elastic-docs/_mapping
{
"properties": {
"title-vector": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "dot_product"
}
}
}

17. Check for the following response on the right side of the screen…

Unset

{
"acknowledged": true
}

Now we need to add an additional pipeline to compare vectorization with Elastic’s ELSER
model.
18. Navigate back to Enterprise Search
19. Click on “Indices” under overview

20. In the list of indices click on “search-elastic-docs”. Notice that we entered “elastic-docs”
for the cluster name earlier, however, we’re referencing it by “search-elastic-docs” here.
This because we preface search indexes with “search”.

21. Near the top of the next screen click on “Pipelines”...


22. In the inference pipeline section we’ll add another pipeline like we did for vectors. Click
on “Add Inference Pipeline”…

23. On this screen we will enter similar information as before with a few adjustments. Let’s
start with choosing “New Pipeline” and then setting the name to “title-elser”. Under
models we’ll choose “Elser Text Expansion”. Then click “Continue” at the bottom of the
page.
24. On the next screen we’ll add a mapping. In the list of source fields select “title”. Then
click “Add” to the right. Notice that the target field is automatically named. At the bottom
click continue.

25. At the bottom click “Continue”. We’ll skip testing the model for now so click “Continue
again.
26. On the review page click “Create Pipeline”.

27. Now let's configure the crawler to capture the Elastic documentation.
On the navigation menu to the left, select Enterprise Search -> Overview

28. Under Content click on “Indices”.

29. Under “Available Indices” click on “search-elastic-docs”.


30. Click on the “Manage Domains” tab and enter “https://www.elastic.co/guide/en”, then
click “Validate Domain”. This checks that the domain we want to index is available and
doesn’t have any limitations like a robot.txt file.

31. You’ll get a warning about robots.txt. This can be ignored.

32. After the checks complete click “Add Domain”.


33. Then click “Crawl Rules” and add the following rules one at a time. These rules make
sure that we don’t index data we don’t need or that won’t help us in the use case. Rules
can be in different formats and ordered to follow specific logic.

Disallow Regex .*

Allow Regex /guide/en/.*/current/.*

Disallow Contains release-notes

The rules should look like this, note the order of the rules…

Note: If you need to reorder the rules click on the “=” sign and drag up or down until correct.

34. Now scroll to the top of the page and click on the blue button titled “Crawl” then select
“Crawl all domains on this index”
The Crawl button will start spinning and this will take some time to complete.

Lab 1 is complete.
Lab 2 - Setting Up the Web Front End

Connecting to OpenAI
While we wait for the crawler, we’ll set up the frontend for sending queries to Elastic.

1. In a browser navigate to https://platform.openai.com/ where you’ll need to sign up for an


OpenAI account. (Don’t worry it’s free.)(If you are unable to use your account or create
one, we will try to provide keys for use with this workshop.)

2. Click on your account and then click on “View API keys” to


3. Now we’ll need to generate an API key to use for connecting in python. Click on “API
Keys”.

4. Click “Create New Secret”


5. Copy the new key and save it. It will not be displayed again.

Launching Streamlit
Now we will use an app called Streamlit to run a web based frontend to submit our queries to
Elastic and ChatGPT.

1. Navigate back to the “Terminal” view in Strigo.

2. At the command prompt, type “cd src/elasticgpt/” and press enter.


Unset
cd src/elasticgpt/

3. To start the frontend application that will give us a webpage to interact with Elastic and
ChatGPT, run the following command…

Unset
streamlit run elasticdocs_gpt.py

There may be a couple of warnings, these can be ignored.

4. Copy the External URL like below…

5. Paste the URL into a new browser tab. Your URL should look similar to this (if you’re on
a VPN or corporate network this port might be blocked.)…

6. Press Enter and a page like below should appear. We’ll fill in the information for this
page during the next lab…
Lab 2 is complete.
Lab 3 - Ask Questions
1. Using the page we loaded from lab 2, enter the API key for OpenAI.
(The other fields under the “Click here to enter Elasticsearch cluster connectivity
information” expansion can be ignored and are included for those that want to run this
environment outside of this workshop.)

2. Below the inputs you’ll see a dropdown box that allows you to select from different
OpenAI models. Feel free to choose whichever you like. Keep in mind that the models
that allow for more tokens will give better answers. However, they also charge more.
(Also note that depending on your account type and status, some models may be
unavailable.)

3. In the prompt within the browser window, enter the question: “What is ELSER?”
The response should look like this…

4. Another question to ask would be: “Generate an elasticsearch query to search for
cows in index cow-logs.” This will respond with a properly formatted query to
search the index for cows.
5. Lastly we can try to ask a random question: “How do I build a boat?”
Due to the focused data we’re using, ChatGPT is unable to answer.

You might also like