0% found this document useful (0 votes)
4 views5 pages

Week 5 Large Language Models

This document provides an overview of a module on large language models (LLMs), focusing on their practical applications such as sentiment analysis, data extraction, and topic modeling. It includes video summaries that detail the use of OpenAI's API for various tasks, including crafting effective prompts, understanding embeddings, and implementing Retrieval Augmented Generation. The document also highlights the limitations and costs associated with using LLMs, along with links to resources for further learning.

Uploaded by

revanthkalla1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views5 pages

Week 5 Large Language Models

This document provides an overview of a module on large language models (LLMs), focusing on their practical applications such as sentiment analysis, data extraction, and topic modeling. It includes video summaries that detail the use of OpenAI's API for various tasks, including crafting effective prompts, understanding embeddings, and implementing Retrieval Augmented Generation. The document also highlights the limitations and costs associated with using LLMs, along with links to resources for further learning.

Uploaded by

revanthkalla1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Large language models

This module covers the practical usage of large language models (LLMs) -- a relatively a new area.

This module is experimental. Things may break when trying these notebooks or during your GA.
Try again, gently.
LLMs incur a cost. We have created API keys for everyone to use gpt-3.5-turbo and text-
embedding-small. Your usage is limited to 50 cents for this course. Don't exceed that.
Use AI Proxy instead of OpenAI. Specifically:
1. Replace your API to https://api.openai.com/... with https://aiproxy.sanand.workers.dev/openai/...
2. Replace the OPENAI_API_KEY with the AIPROXY_TOKEN that someone will give you.

LLM Sentiment analysis (Video 1)


Video summary

This video explores using large language models (LLMs) for sentiment analysis and
classification. It demonstrates how to use the OpenAI API to analyze movie reviews for
sentiment and genre without any training.

Highlights:

• 00:063 Introduction to sentiment analysis with LLMs


o Using a small movie reviews dataset
o Identifying sentiment as positive or negative
o Determining the genre of the movie
• 01:054 Using OpenAI API for sentiment analysis
o Choosing the GPT-3.5 Turbo model
o Providing system instructions and movie reviews
o Analyzing the sentiment of reviews
• 04:095 Comparing different LLM models
o Testing GPT-3.5 Turbo and GPT-4 models
o Observing differences in sentiment analysis results
o Discussing model capabilities and token usage
• 10:126 Implementing sentiment analysis programmatically
o Using Python and OpenAI API
o Storing API keys and making requests
o Handling responses and extracting sentiment
• 24:027 Training LLMs with examples
o Providing examples to train the model
o Using few-shot learning for better results
o Discussing prompt engineering and cost considerations

You'll learn how to use large language models (LLMs) for sentiment analysis and classification,
covering:

• Sentiment Analysis: Use OpenAI API to identify the sentiment of movie reviews as positive
or negative.
• Prompt Engineering: Learn how to craft effective prompts to get desired results from
LLMs.
• LLM Training: Understand how to train LLMs by providing examples and feedback.
• OpenAI API Integration: Integrate OpenAI API into Python code to perform sentiment
analysis.
• Tokenization: Learn about tokenization and its impact on LLM input and cost.
• Zero-Shot, One-Shot, and Multi-Shot Learning: Understand different approaches to using
LLMs for learning.
Here are the links used in the video:

• Jupyter Notebook
• Movie reviews dataset
• OpenAI Playground
• OpenAI Pricing
• OpenAI Tokenizer
• OpenAI API Reference
• OpenAI Docs

LLM Extraction (Video 2)


Video summary

This video discusses how to systematically extract structured information from a dataset
using language models. It covers various techniques and tools, including JSON schemas and
OpenAI’s API, to extract and format data efficiently.

Highlights:

• 00:023 Introduction to data extraction


o Explanation of systematic information extraction
o Example of extracting state names and ZIP codes
o Importance of structured formats like JSON
• 03:054 Using OpenAI API for extraction
o Creating a secret for OpenAI API
o Extracting data using API keys
o Handling unstructured data
• 10:025 Challenges with inconsistent data
o Issues with varying data formats
o Example of famous addresses with inconsistent formats
o Using examples for one-shot prompting
• 18:006 Ensuring valid JSON output
o Importance of response format in JSON
o Using JSON schema for structured output
o Handling invalid JSON responses
• 27:007 Advanced techniques and tools
o Using JSON schema for defining structure
o Example of customer service emails
o Benefits of using structured extraction in production

You'll learn how to use LLMs to extract structure from unstructured data, covering:
• LLM for Data Extraction: Use OpenAI's API to extract structured information from
unstructured data like addresses.
• JSON Schema: Define a JSON schema to ensure consistent and structured output from
the LLM.
• Prompt Engineering: Craft effective prompts to guide the LLM's response and improve
accuracy.
• Data Cleaning: Use string functions and OpenAI's API to clean and standardize data.
• Data Analysis: Analyze extracted data using Pandas to gain insights.
• LLM Limitations: Understand the limitations of LLMs, including potential errors and
inconsistencies in output.
• Production Use Cases: Explore real-world applications of LLMs for data extraction, such
as customer service email analysis.
Here are the links used in the video:

• Jupyter Notebook
• JSON Schema
• Function calling

LLM Topic modelling (Video 3)


PART 1Video summary

This video explains the concept of embeddings in large language models (LLMs) and
demonstrates how to use OpenAI embeddings for topic modeling. It covers the process of
converting text into numerical arrays, visualizing embeddings, and classifying words into
categories using embeddings and clustering algorithms.

Highlights:

• 00:043 Introduction to embeddings


o Converts text into numerical arrays
o Similar numbers indicate similar meanings
o Visualized using TensorFlow projector
• 02:524 Classifying words into categories
o Example with fruits, countries, and companies
o Using OpenAI embeddings for classification
o Comparing different embedding models
• 10:025 Creating embeddings with OpenAI API
o Steps to create embeddings using API
o Importance of model selection
o Differences between embedding models
• 19:336 Clustering embeddings
o Using K-means clustering algorithm
o Visualizing clusters with Matplotlib
o Adjusting the number of clusters
• 27:037 Cost and efficiency of embeddings
o Cost comparison of different models
o Running embeddings on local machines
o Using sentence transformers for embeddings
You'll learn to use text embeddings to find text similarity and use that to create topics
automatically from text, covering:

• Embeddings: How large language models convert text into numerical representations.
• Similarity Measurement: Understanding how similar embeddings indicate similar
meanings.
• Embedding Visualization: Using tools like Tensorflow Projector to visualize embedding
spaces.
• Embedding Applications: Using embeddings for tasks like classification and clustering.
• OpenAI Embeddings: Using OpenAI's API to generate embeddings for text.
• Model Comparison: Exploring different embedding models and their strengths and
weaknesses.
• Cosine Similarity: Calculating cosine similarity between embeddings for more reliable
similarity measures.
• Embedding Cost: Understanding the cost of generating embeddings using OpenAI's API.
• Embedding Range: Understanding the range of values in embeddings and their
significance.
Here are the links used in the video:

• Jupyter Notebook
• Tensorflow projector
• Embeddings guide
• Embeddings reference
• Clustering on scikit-learn
• Massive text embedding leaderboard (MTEB)
• gte-large-en-v1.5 embedding model
• Embeddings similarity threshold

Retrieval Augmented Generation


The video is not available yet. Please review the notebook, which is self-explanatory.

You will learn to implement Retrieval Augmented Generation (RAG) to enhance language models'
responses by incorporating relevant context, covering:

• LLM Context Limitations: Understanding the constraints of context windows in large


language models.
• Retrieval Augmented Generation: The technique of retrieving and using relevant
documents to enhance model responses.
• Embeddings: How to convert text into numerical representations that are used for
similarity calculations.
• Similarity Search: Finding the most relevant documents by calculating cosine similarity
between embeddings.
• OpenAI API Integration: Using the OpenAI API to generate responses based on the most
relevant documents.
• Tourist Recommendation Bot: Building a bot that recommends tourist attractions based
on user interests using embeddings.
• Next Steps for Implementation: Insights into scaling the solution with a vector database,
re-rankers, and improved prompts for better accuracy and efficiency.
Here are the links used in the video:

• Jupyter Notebook
• gte-large-en-v1.5 embedding model
• Awesome vector database

You might also like