CB116-Lab-Workbook (6.x)
CB116-Lab-Workbook (6.x)
Couchbase 6.x+
Lab Workbook
Leo Schuman
with Keshav Murthy and Kevin Holder
Lab 1 - Install Couchbase and load sample data
Objectives
If you have an existing Couchbase installation you wish to restore later, archive the data and
configuration files from these locations.
Linux /opt/couchbase/var/lib/couchbase
You want to run Couchbase from your local system for learning purposes.
1. Download Couchbase Server Enterprise Edition 6.x+ for your operating system.
http://www.couchbase.com/downloads
2. Review the Release Notes, and install as described in the documentation for your OS:
http://docs.couchbase.com
Note, on Windows, you must run the installer using elevated Administrator permissions.
To explore container-based installation, see:
http://www.couchbase.com/containers.
3. If the Couchbase Setup tool does not launch automatically, due to your local
configuration, open a web browser, and browse to this URL to launch the Setup tool.
http://localhost:8091
5. In the Setup tool, review all settings and accept all defaults, except for changing or
confirming the following:
Note, if requested by your local firewall, accept incoming network connections for
beam.smp, memcached, epmd, i ndexer, moxi, projector, cbq-engine, and cbft. A full list
of port requirements is available here:
https://developer.couchbase.com/documentation/server/current/install/install-ports.html
Note, the settings for this course run a low-impact single node Couchbase cluster on a
local system for non-performance related learning purposes, only. Couchbase advises a
minimum of 3 nodes for any production cluster. Please read the documentation.
7. In the Couchbase UI, navigate to and briefly review each top-level screen.
You want to create and load the basic Couchbase data container, a bucket.
Note, while you will see no activity at this point, you may click the Statistics link to view
the scope of statistical data available for bucket-specific operations.
META().id = "airline_10"
Note, any valid N1QL filtering expression ("WHERE clause") may be used in this field.
N1QL filtering syntax, including use of the META() function, will be introduced more
generally, in the following Labs.
18. Click the Edit button for any index. This will open the Query Workbench.
Note, the Query Workbench will be used and explored further throughout all Labs ahead.
20. (Optional) Locate and explore the def_primary index, and compare its syntax with other
indexes imported as part of the travel-sample data set.
End of Lab
You want data from specific fields, from limited documents matching a condition.
1. In the Couchbase UI, select the Query screen for the Workbench. Notice that
travel-sample is listed as a Fully Queryable Bucket.
Note, travel-sample is fully queryable because a primary index has been defined and
imported with the sample data, and rebuilt during import. More on this in the next Lab.
2. In the Query Workbench, write and execute a query to select all fields from airport
documents, limiting the results to no more than 10 documents.
SELECT *
FROM `travel-sample`
WHERE type = "airport"
LIMIT 10;
Note, code samples can be copied and pasted from this workbook.
3. You should see results like the following. Notice the metadata returned, and that results
are structured as an array ( [ … ] ) of JSON objects ( { … } )
5. Write and execute a query to select the name, address, and reviews fields from hotel
documents, limiting the results to 10 documents.
6. You should see the following. Again, notice the structure of multidimensional results.
8. Switch to JSON display, and scroll through the results. Notice that a result is generated
for any document containing at least one selected field, and that documents can be
distinguished by consistent use of a type field.
Note, the document type is encoded both as a field, and as a prefix to the document key.
Key prefixing is a common NoSQL Document Database pattern. However, embedding
type fields, as shown, is a very useful best practice, when using N1QL.
9. Count the number of documents of each type in the bucket, by grouping results by type,
and counting the results. Use an alias to identify the expression result as count.
You want to apply multiple filters, and exclude or include documents based on whether a
specified field is present.
10. Select, group, and count all documents by country. Notice that documents with no
country field are also counted.
12. Modify the query to check whether country has been omitted from any hotel document.
The query should return no records.
Note, obviously query execution times will vary widely by cluster size and capacity. In
this course, all timing comparisons should be considered relatively, in this light.
You want to determine what index or indexes are in use, along with related details.
12. Click Explain to generate the query plan for the prior query. Click Plan as the output
display format. Notice the query has relied on an index named def_type on the
travel-sample bucket.
Note, the travel-sample bucket imports along with a series of predefined indexes. Index
creation and deletion will be explored in the next lab.
Note, if using a text based tool, such as cbq, query plans may be generated by prefixing
any query with the EXPLAIN keyword.
EXPLAIN
SELECT country, COUNT(*) AS count
FROM `travel-sample`
WHERE type = "hotel" AND country IS NULL
GROUP BY country
End of Lab
1. In the Query screen of the Couchbase UI, drop the def_type index which was imported
with travel-sample bucket.
4. Compare this execution time, to the time noticed in the prior Lab, when this query was
run using the def_type index.
5. Explain this query, and notice that while it still runs, it is relying on a primary index scan.
You want to visualize the necessity of a primary index, if no other index is available.
Note, because there can be only one primary index per bucket, it is not necessary to
name this index. However, if you do, as in travel-sample, it is dropped using the syntax
shown, like any named index. An unnamed primary index can be dropped with:
Because of our memory-first architecture, Couchbase does not support table scans.
Instead, a primary index delivers the same capability. Any arbitrary query can be run
against the primary index of a bucket. However, it is optional when and whether to create
and maintain a primary index (e.g., during development.) In production, a primary index
might well be dropped to conserve resources otherwise expended maintaining it.
7. Click the query history back arrow to return to the most recent SELECT query, then
execute this query.
Note, without at least a primary index, most N1QL queries will not run. In most use
cases, however, production queries will rely on one or more well-designed secondary
indexes, rather than a primary index.
You want to create a secondary index to support the needs of a specific query.
8. Review the most recent SELECT query, and consider which fields might be most
usefully indexed, to speed this query's performance. Filter ("WHERE clause") fields are
the most obvious candidates for indexing.
9. Create a multi-field index named def_country_type on the travel-sample bucket, for the
type and country fields.
10. Use history to access and run the previous SELECT query. The query should once again
run, and also with a significantly faster execution speed than previously.
11. Recreate a primary index on the travel-sample bucket. Leave this one unnamed.
12. In the Indexes screen, notice the default name given to a primary index: #primary.
13. Run and then explain the following query. Notice that it runs, but relatively slowly.
14. Explain the query to see what you likely expect. It is relying on the primary index.
You want to index specified fields, but only for a particular document type.
15. Create a new index for the type and country fields, but restrict it to airline documents.
17. Examine the query plan to verify the newly created index is in use. Review available
indexes, and consider which would need to be dropped for the query to fail to run.
End of Lab
1. In the Query screen of the Couchbase UI, insert a new JSON document into the
travel-sample bucket, and assign "abc::123" as its key.
Note, key patterns are no more enforced by Couchbase than document schema. Key
prefixing (e.g., "abc::123") is a common pattern for assigning a document type, and
virtually any text separator may be used between prefix and id (e.g., "abc::123"). But, a
type field serves the same purpose and, as shown in prior Labs, is easily used in query
filters. It would be redundant to use both approaches to identify document type.
2. Execute the query again. You should get an error, because this key already exists.
You want to aggregate data and insert results as new documents with embedded type.
4. Count airlines by their country, using aliasing to add a type field with the literal string
value "country-count" to the results. From these results, assign the country value as an
id, and the full result as a document, named doc, as a selection of data to be upserted
as a new key (named id) and value (named doc) into the travel-sample bucket.
Based on the travel-sample data, you should see 3 new documents created.
Note, alternately, of course, you could query for these documents by their type field.
SELECT *
FROM `travel-sample`
WHERE type = "country-count"
DELETE
FROM `travel-sample`
WHERE type = "country-count";
You want to insert new documents, using an expression to generate a key-prefixed type.
7. Modify the prior query to remove the embedded type field. Instead, use an expression in
the SELECT clause to generate a document key (id) which concatenates
"country-count::" as a prefix to the country value.
Note, tracking document type is a best practice, but is no more enforced, by Couchbase,
than any other aspect of your schema. The two prevailing patterns for doing so are to
either embed a type field, or assign a type prefix to the document key. Broadly speaking,
key-value oriented access patterns may benefit from key prefixing, while N1QL oriented
access patterns may benefit from embedded type fields, which are easily filtered.
Note, alternately, of course, you could query for these documents by their key prefix.
SELECT *
FROM `travel-sample`
WHERE META().id LIKE "country%"
SELECT *
FROM `travel-sample`
USE KEYS ['country-count::France', 'country-count::United Kingdom', 'country-count::United
States'];
Note, the USE KEYS clause provides direct key-value access, via N1QL. Because it
relies on direct key access, analogous to an SDK-level get() by key operation, it is both
highly performant, and requires no index.
End of Lab
B Select and sum joined document data, view plan (JOIN ON, ROUND(), SUM())
You want to explore the travel-sample document model and count by type.
1. In the Couchbase UI, verify that type fields in all travel-sample documents are indexed.
You may have deleted the def_type index, if you completed all prior labs. Recreate this
index, if needed.
2. Query the distinct document types in the bucket. You should see route, airline, airport,
hotel, and landmark as distinct document types.
5. In the Bucket Insights view, explore the field structure of airport and route documents.
Notice that airport documents have a faa field, containing an airport code. And, route
documents have a sourceairport field, identifying the beginning airport of that route. Also,
notice the indexing information provided by this view.
7. Select all document fields where the document type is within an array containing airport
or route, and the document also relates to San Francisco International airport (SFO), per
the two fields described from the Bucket Insights above. You should get 1 airport
document and 249 route documents.
SELECT *
FROM `travel-sample`
WHERE type IN ["airport", "route"]
AND faa = "SFO" OR sourceairport = "SFO";
B. Select and sum joined document data, view plan (JOIN ON, ROUND(), SUM())
You want to select and manipulate data from documents with common field data.
8. From route and airport documents, select the beginning airport documents faa code,
aliased as begin_faa, route documents destinationairport code, aliased as end_faa, and
the route document distance, aliased as distance, for all routes originating from SFO
9. Join airport and route document on the related values in their faa and destinationairport
fields, aliasing the left as airport and the right as route.
10. Use type filters to restrict the left (route) to route documents, and the right (airport) to
airport documents.
11. Round the distance to two decimal points.
12. Your query should look like this:
16. Explain the query, and display its visual query plan. Use the visual query plan display
controls to change display direction and adjust size. Click-drag as needed. Review how
this query has been processed. Notice available indexes, and how they have been used.
End of Lab