Week 5 Large Language Models
Week 5 Large Language Models
This module covers the practical usage of large language models (LLMs) -- a relatively a new area.
This module is experimental. Things may break when trying these notebooks or during your GA.
Try again, gently.
LLMs incur a cost. We have created API keys for everyone to use gpt-3.5-turbo and text-
embedding-small. Your usage is limited to 50 cents for this course. Don't exceed that.
Use AI Proxy instead of OpenAI. Specifically:
1. Replace your API to https://api.openai.com/... with https://aiproxy.sanand.workers.dev/openai/...
2. Replace the OPENAI_API_KEY with the AIPROXY_TOKEN that someone will give you.
This video explores using large language models (LLMs) for sentiment analysis and
classification. It demonstrates how to use the OpenAI API to analyze movie reviews for
sentiment and genre without any training.
Highlights:
You'll learn how to use large language models (LLMs) for sentiment analysis and classification,
covering:
• Sentiment Analysis: Use OpenAI API to identify the sentiment of movie reviews as positive
or negative.
• Prompt Engineering: Learn how to craft effective prompts to get desired results from
LLMs.
• LLM Training: Understand how to train LLMs by providing examples and feedback.
• OpenAI API Integration: Integrate OpenAI API into Python code to perform sentiment
analysis.
• Tokenization: Learn about tokenization and its impact on LLM input and cost.
• Zero-Shot, One-Shot, and Multi-Shot Learning: Understand different approaches to using
LLMs for learning.
Here are the links used in the video:
• Jupyter Notebook
• Movie reviews dataset
• OpenAI Playground
• OpenAI Pricing
• OpenAI Tokenizer
• OpenAI API Reference
• OpenAI Docs
This video discusses how to systematically extract structured information from a dataset
using language models. It covers various techniques and tools, including JSON schemas and
OpenAI’s API, to extract and format data efficiently.
Highlights:
You'll learn how to use LLMs to extract structure from unstructured data, covering:
• LLM for Data Extraction: Use OpenAI's API to extract structured information from
unstructured data like addresses.
• JSON Schema: Define a JSON schema to ensure consistent and structured output from
the LLM.
• Prompt Engineering: Craft effective prompts to guide the LLM's response and improve
accuracy.
• Data Cleaning: Use string functions and OpenAI's API to clean and standardize data.
• Data Analysis: Analyze extracted data using Pandas to gain insights.
• LLM Limitations: Understand the limitations of LLMs, including potential errors and
inconsistencies in output.
• Production Use Cases: Explore real-world applications of LLMs for data extraction, such
as customer service email analysis.
Here are the links used in the video:
• Jupyter Notebook
• JSON Schema
• Function calling
This video explains the concept of embeddings in large language models (LLMs) and
demonstrates how to use OpenAI embeddings for topic modeling. It covers the process of
converting text into numerical arrays, visualizing embeddings, and classifying words into
categories using embeddings and clustering algorithms.
Highlights:
• Embeddings: How large language models convert text into numerical representations.
• Similarity Measurement: Understanding how similar embeddings indicate similar
meanings.
• Embedding Visualization: Using tools like Tensorflow Projector to visualize embedding
spaces.
• Embedding Applications: Using embeddings for tasks like classification and clustering.
• OpenAI Embeddings: Using OpenAI's API to generate embeddings for text.
• Model Comparison: Exploring different embedding models and their strengths and
weaknesses.
• Cosine Similarity: Calculating cosine similarity between embeddings for more reliable
similarity measures.
• Embedding Cost: Understanding the cost of generating embeddings using OpenAI's API.
• Embedding Range: Understanding the range of values in embeddings and their
significance.
Here are the links used in the video:
• Jupyter Notebook
• Tensorflow projector
• Embeddings guide
• Embeddings reference
• Clustering on scikit-learn
• Massive text embedding leaderboard (MTEB)
• gte-large-en-v1.5 embedding model
• Embeddings similarity threshold
You will learn to implement Retrieval Augmented Generation (RAG) to enhance language models'
responses by incorporating relevant context, covering:
• Jupyter Notebook
• gte-large-en-v1.5 embedding model
• Awesome vector database