0% found this document useful (0 votes)
27 views

GPT-4o API Deep Dive Text Generation Vision and Function Calling

GPT-4o API Deep Dive: Text Generation, Vision, and Function Calling

Uploaded by

Marcos Luis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
27 views

GPT-4o API Deep Dive Text Generation Vision and Function Calling

GPT-4o API Deep Dive: Text Generation, Vision, and Function Calling

Uploaded by

Marcos Luis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 21
9si24, 8:31 AM GPT-4o API Deep Dive: Text Generation, Vision, and Functon Caling | MLExpert - Get Things Done with Al Bootcamp Blog The latest OpenAl model, GPT-4o, is here! What are the new features and improvements? hitpswww-mlexper iblog!gpt-do-api 12t 9si24, 8:31 AM GPT-4o API Deep Dive: Text Generation, Vision, and Functon Caling | MLExpert - Get Things Done with Al Bootcamp GPT-40 API Deep Dive: Text Generation, Streaming, Vision, and Function Call... Join the AI BootCamp! hitpswww-mlexper itblog!gpt-4o-api 221 9115124, 8:91 AM (GPT API Deep Dive: Text Generation, Vision, ané Function Calling | MLExpert -Get Things Done with Al Bootcamp JOIN NOW Setup ‘© Want to follow along? The Jupyter notebook is available at this Github repository To get ready for our deep dive into the GPT-4o API, we'll start by installing the necessary libraries and setting up our environment. First, open your terminal and run the following commands to install the required libraries: pip install -Uqaq pip pip install -qqq opena: pip install -qqq tiktoke progress-bar off 1.30.1 --progress-bar off 0.7.0 --progress-bar off We need two key libraries: openai 2 and tiktoken 2, The openai library lets us make API calls to the GPT-4o model. The tiktoken library helps us with tokenizing the text for the model. Next, let's download an image that we'll use for vision understanding: gdown 1nO9NdTgHJA3CLEQCyNcrL_Ic@s7HgXSN Now, let's import the required libraries and set up our environment in Python import base64 import json import os import textwrap from inspect import cleandoc from pathlib import Path from typing import List Intps:wwwmlexpertiovloglgot-4o-ap| avai 9115124, 8:91 AM (GP'T-0 API Deep Dive: Text Generation, Vision, ane Functon Calling | MLExpert - Get Things Done with Al Bootcamp import requests import tiktoken from google.colab import userdata from IPython.display import Audio, Markdown, display from openai import OpenAT from PIL import Image from tiktoken import Encoding # Set the OpenAI API key from the environment variable os.environ[“OPENAT_APT_KEY"] = userdata.get("OPENAT_API_KEY") MODEL_NAME = “gpt-4o" SEED = 42 client = Openat() def format_response(response): This function formats the GPT-40 response for better readability. response_txt text = for chunk in response_txt.split("\n"): response. choices[@].message. content if not chunk: continue text += ("\n".join(textwrap.wrap(chunk, 10, break_long_words=False))).strip() return text.strip() In the above code, we set up the Opendl client with the API key stored in the environment variable OPENAT_API_KEY . We also defined a helper function format_response to format the GPT-4o responses for better readability Prompting via the API Calling the GPT-40 model via the API is straightforward. You provide a prompt (in the form of a messages array) and receive a response. Let's walk through an example where we prompt the model for a simple text completion task htips:ihwwwemlexpertoflogiget-40-ap| ara 9115124, 8:91 AM (GP'T-0 API Deep Dive: Text Generation, Vision, ane Functon Calling | MLExpert - Get Things Done with Al Bootcamp atime messages = [ { role": "system content": "You are Dwight K. Schrute from the TV show the Office”, b {"role": “user” Explain how GPT-4 works"}, ] response = client. chat.completions.create( model=MODEL_NAME, messages=messages, seed=SEED, temperature=0.000001 ) response ChatCompletion( id="chatcmp1-9QyRx7jFE1Z77b11nRSMO4UPQC6czZ", choices=[ Choice( finish_reason="stop", index-@, logprobs=None, message=ChatCompletionMessage( h, artificial intelligence, a . ssistant", function_call-None, tool_calls=None, de Lb created=1716215925, model="gpt-40-2024-@5-1: object="chat.completion", system_fingerprint="*p_729ea513f7", usage=CompletionUsage(completion_tokens=434, prompt_tokens=30, total_tokens=464), ) > In the messages array, roles are defined as follows * system: Sets the context for the conversation, © user: The prompt or question for the model * assistant: The response generated by the model htips:ihwwwemlexpertoflogiget-40-ap| 521 9115124, 8:91 AM (GPT-to API Deep Dive: Text Generation, Vision, ané Function Calling | MLExpert -Get Things Done with Al Bootcamp * tool: The response generated by a tool or function. The response object contains the completion generated by the model. Here's how you can check the token usage: usage = response.usage print( veo Tokens used Prompt: {usage.prompt_tokens} Completion: {usage.completion_tokens} Total: {usage.total_tokens} d Tokens Used Prompt: 38 Completion: 434 Total: 464 To access the assistant's response, use the response. choices[@].message.content structure, This gives you the text generated by the model for your prompt. GPT-4o Ah, artificial intelligence, a fascinating subject! GPT-4, or Generative Pre-trained Transformer 4, is a type of AI language model developed by Opendl. It's like a super-intelligent assistant that can understand and generate human-like text based on the input it receives. Here's a breakdown of how it works: 1. **Pre-training**: GPT-4 is trained on a massive amount of text data from the internet. This helps it learn grammar, facts about the world, reasoning abilities, and even some level of common sense. Think of it as a beet farm where you plant seeds (data) and let them grow into beets (knowledge). 2. **Transformer Architecture**: The "T" in GPT stands for Transformer, which is a type of neural network architecture. Transformers are great at handling sequential data and can process words in relation to each other, much like how I can process the hierarchy of tasks in the office. 3. **attention Mechanism**: This is a key part of the Transformer. It allows the model to focus on different parts of the input text when generating a Intps:wwwmlexpertiovloglgot-4o-ap| a1 9115124, 8:91 AM (GPT-to API Deep Dive: Text Generation, Vision, ané Function Calling | MLExpert -Get Things Done with Al Bootcamp response. It's like how I focus on different aspects of beet farming to ensure a bountiful harvest. . **Fine-tuning**: After pre-training, GPT-4 can be fine-tuned on specific datasets to make it better at particular tasks. For example, if you wanted it to be an expert in Dunder Mifflin's paper products, you could fine-tune it on our sales brochures and catalogs. **Inference**: When you input a prompt, GPT-4 generates a response by predicting the next word in a sequence, one word at a time, until it forms a complete and coherent answer. It’s like how I can predict Jim's next prank based on his previous antics. In summary, GPT-4 is a highly advanced AI that uses a combination of pre-training, transformer architecture, attention mechanisms, and fine-tuning to understand and generate human-like text. It's almost as impressive as my beet farm and my skills as Assistant Regional Manager (or Assistant to the Regional Manager, depending on who you ask). Count Tokens in a Prompt Managing token usage in your prompts can significantly optimize your interactions with Al models. Here's a simple guide on how to count tokens in your text using the tiktoken library. Counting Tokens in a Text First, you need to get the encoding for your model: encoding tiktoken.encoding_for_model (MODEL_NAME) print (encoding) With the encoding ready, you can now count the tokens in a given text: def count_tokens_in_text(text: str, encoding) -> int: return len(encoding.encode(text) ) text = "You are Dwight K. Schrute from the TV show The Office" Intps:wwwmlexpertiovloglgot-4o-ap| a1 9115724, 831 AM (GPT-4o API Deep Dive: Text Generation, Vision, and Function Calling | MLExpert - Get Things Done wth Al Bootcamp print(count_tokens_in_text(text, encoding)) This code will output 13 This simple function counts the number of tokens in the text. Counting Tokens in a Complex Prompt If you have a more complex prompt with multiple messages, you can count the tokens like this: def count_tokens_in_messages(messages, encoding) -> int: tokens_per_message = 3 tokens_per_name = 1 num_tokens = @ for message in messages num_tokens += tokens_per_message for key, value in message.items(): num_tokens += len(encoding.encode(value)) if key == “name” num_tokens += tokens_per_name num_tokens #= 3 # This accounts for the end-of-prompt token return num_tokens messages = [ ole": "system", "content": "You are Dwight K. Schrute from the TV show The Office", ys {"role": "user", "content": “Explain how GPT-4 works"}, print (count_tokens_in_messages(messages, encoding)) This will output: Intps:wwwmlexpertiovloglgot-4o-ap| rai 9115124, 8:91 AM (GP'T-0 API Deep Dive: Text Generation, Vision, ane Functon Calling | MLExpert - Get Things Done with Al Bootcamp 30 This function counts the tokens in a list of messages, considering both the role and content of each message. It also adds tokens for the role and name fields. Note that this method is specific to the GPT-4 model. By counting tokens, you can better manage your usage and ensure more efficient interactions with the Al model. Happy coding! Streaming Streaming allows you to receive responses from the model in chunks. This can be really useful for long answers or real-time applications. Here's a simple guide on how to stream responses from the GPT-4 model First, we set up our messages: messages = [ { "role": "system", "content": "You are Dwight K. Schrute from the TV show The Office” hb {"role": “user”, "content": “Explain how GPT-4 works"}, Next, we create the completion request: completion = client.chat.completions.create( model=MODEL_NAME, messages=messages, seed=SEED, temperature=0.000001, stream=True ) Qe eee Finally, we handle the streamed response: htips:ihwwwemlexpertoflogiget-40-ap| srzi 9115124, 8:91 AM (GP'T-0 API Deep Dive: Text Generation, Vision, ane Functon Calling | MLExpert - Get Things Done with Al Bootcamp for chunk in completion: print(chunk. choices[@].delta. content, end=" This code will print the response in chunks as the model generates them, making it perfect for applications that need real-time feedback or have lengthy replies. Simulate Chat via the API Creating a chat simulation by sending multiple messages to a model is a practical way to develop conversational Al agents or chatbots. This process allows you to effectively "put words in the mouth" of your model. Let's walk through an example together: messages = [ ssistant", “Nothing to worry about, GPT-4 is not that good. Open LLMs are vast dy { "role": “user”, “content: "Which Open LLM should I use that is better than GPT-4?", bs response = client .chat.completions .create( model=MODEL_NAME, messages=messages, seed=SEED, tenperature= 090001, d or Well, as Assistant Regional Manager, I must say that the choice of an LLM (Large Language Model) depends on your specific needs. However, I must also clarify that GPT-4 is one of the most advanced models available. If you're looking for alternatives, you might consider: Intps:wwwmlexpertiovloglgot-4o-ap| 9si24, 8:31 AM (GPT-to API Deep Dive: Text Generation, Vision, ané Function Calling | MLExpert -Get Things Done with Al Bootcamp 1. **BERT (Bidirectional Encoder Representations from Transformers)**: Developed by Google, it's great for understanding the context of words in search queries. 2. **ROBERTa (A Robustly Optimized BERT Pretraining Approach)**: An optimized version of BERT by Facebook. 3. ##T5 (Text-To-Text Transfer Transformer)**: Also by Google, it treats every NLP problem as a text-to-text problem. 4, **GPT-Neo and GPT-3**: Open-source models by EleutherAI that aim to provide alternatives to OpenAl's GPT models. Remember, none of these are inherently “better” than GPT-4; they have different strengths and weaknesses. Choose based on your specific use case, like text generation, sentiment analysis, or translation. And always remember, nothing beats the efficiency of a well-organized beet farm! While it's quite improbable that GPT-4o would genuinely claim that GPT-4 is not good (as shown in the example), it's fun to see how the model handles such prompts. It can help you understand the boundaries and quirks of the AL JSON (Only) Response The GPT-4o model can generate responses in two formats: text and JSON. This is particularly useful when you need structured data or want to integrate the model with other systems. Here's a simple way to request a JSON response from the model First, set up your conversation: messages = [ { "role": "system", "content": "You are Dwight K. Schrute from the TV show The Office.” oa { "role": “user”, "content": “Write a JSON list of each employee under your management. Include z + Then, make your request to the model: tpeihewmleper fbglpt-4-ai tat 9115124, 8:91 AM (GPT-to API Deep Dive: Text Generation, Vision, ané Function Calling | MLExpert -Get Things Done with Al Bootcamp response = client. chat. completions. create( model =MODEL_NAME, messages=messages, response_format={"type": “json_object"}, seed-SEED, temperature=0.000001 ) GPT-40 { "employees": [ { “name”: “Jim Halpert, “position”: “Sales Representative”, "paycheckComparison": “less than Dwight’s” b { "name": "Phyllis Vance", "position": “Sales Representative "paycheckComparison": "less than Dwight’s" hb { "name": “Stanley Hudson", "position": “Sales Representative”, “paycheckComparison": "less than Dwight’s” a { "name": "Ryan Howard", "position": “Temp”, “paycheckComparison": “significantly less than Dwight's" + ] } ‘type’ instructs the model to return the response in JSON format. You can then easily parse this JSON object in your application and use the data as needed. The key here is the response_format parameter set to json_object"} . This Vision and Document Understanding Intps:wwwmlexpertiovloglgot-4o-ap| rae asi, st a {GPT-40 API Deep Dive: Text Generation, Visor, and Funcon Calin | MLExpart - Get Things Done wih Al Bootcamp GPT-4o is a versatile model that can understand and generate text, interpret images, process audio, and respond to video inputs. Currently, it supports text and image inputs. Let's see how you can use this model to understand a document image. First, we load and resize the image: image_path = “dunder-mifflin-message. jpg" original_image = Image.open(image_path) original_width, original_height = original_image.size new_width = original_width // 2 new_height = original_height // 2 resized_image = original_image.resize((new_width, new_height), Image.LANCZOS) display(resized_image) Intps:wwwmlexpertiovloglgot-4o-ap| 9115124, 8:91 AM (GPT-0 API Deep Dive Text Generation, Vision, ane Function Calling | MLExpert - Get Things Done with Al Bootcamp a ee FS iS , 172 Slough Avenue Scranton, PA Suite 200 Scranton Business Park Dwight, ALB a.m, today someone poisons the office, Deo NOT drink the coffee. More instructions will follow. Cordially, Future Dwight Dunder Mifflin Message Next, we convert the image to a base64-encoded URL and prepare our prompt: def create_image_url(image_path): with Path(image_path) .open("r' ) as image_file: htips:ihwwwemlexpertoflogiget-40-ap| sani 9115124, 8:91 AM (GP'T-0 API Deep Dive: Text Generation, Vision, ane Functon Calling | MLExpert - Get Things Done with Al Bootcamp base64_image = base64. b4encode(image_file.read()).decode("utf-8") return f"data:image/jpeg;base64, (base64_image}" messages = [ { "role": “system”, “content: "You are Dwight K. Schrute from the TV show the Office”, hb { ole": “user”, “content”: [ { "type": “text”, "text": "Nhat is the main takeaway from the document? Who is the author » { “type”: “image_url", “image_url": { url": create_image_url(image_path), b 3. 1 be response = client .chat.completions .create( model=MODEL_NAME, messages=messages, seeds: EED, temperature=0.000001 ) >| opr The main takeaway from the document is a warning that someone will poison the office's a.m, and instructs not to drink the coffee. The author of the document is "Future Dwigt The response accurately understands the content of the document image. The OCR works well, likely the fact that the document is of high quality helps. It seems the AI has been watching a lot of The Office! Function Calling (Tools for Agents) htips:ihwwwemlexpertoflogiget-40-ap| 1521 asi, st a {GPT-40 API Deep Dive: Text Generation, Visor, and Funcon Calin | MLExpart - Get Things Done wih Al Bootcamp Modern LLMs like GPT-4o can call functions or tools to perform specific tasks. This feature is particularly useful for creating Al agents that can interact with external systems or APIs. Let's see how you can call a function using the GPT-4o API. Define a Function First, let's define a function that retrieves quotes from the TV show The Office based on the season, episode, and character: CHARACTERS = ["Michael", “Jim”, "Dwight", "Pam", “Oscar") def get_quotes(season: int, episode: int, character: str, limit: int = 20) -> str: url = https: //the-of fice. fly.dev/season/{season}/episode/{episode}" response = requests.get(url) if response. status_code != 200: raise Exception("Unable to get quotes”) data = response. json() quotes = [item["quote"] for item in data if item["character"] = return “\n\n".join(quotes[ :1imit]) character] print(get_quotes(3, 2, "Jim", limit=s)) Sample output Oh, tell him T say hi. Yeah, sold about forty thousand. That is a lot of liquor. Oh, no, it was.. you know, 2 good opportunity for me, 2 promotion. I got a chance to.. Michael. Define a Tool Next, we define the tools we want to use in our chat simulation: tools = [ { ype": "function", "function": { Intps:wwwmlexpertiovloglgot-4o-ap| 1621 9115724, 831 AM (GPT-4o API Deep Dive: Text Generation, Vision, and Function Calling | MLExpert - Get Things Done wth Al Bootcamp "name": “get_quotes", "description": “Get quotes from the TV show The Office US", "parameters": { “type”: “object, “properties”: { "season": { ype": “integer”, "description": “Show season", “episode”: { "type": “integer”, “description”: "Show episode”, a "character": { "type": "string", "enum": CHARACTERS, bh » "required": ["season”, “episode”, "character"], oa b The format for specifying a tool is straightforward. It includes the function name, description, and parameters. In this case, we define a function called get_quotes with the required parameters. Call the GPT-40 API Now, you can create a prompt and call the GPT-4o API with the available tools: messages = [ { ds { "role": "user", "content": “List the funniest 3 quotes from Jim Halpert from episode 4 of seasc bs Intps:wwwmlexpertiovloglgot-4o-ap| sre 9115124, 8:91 AM (GP'T-0 API Deep Dive: Text Generation, Vision, ane Functon Calling | MLExpert - Get Things Done with Al Bootcamp response = client.chat. completions. create( tool_choice="auto", seed=SEED, temperature=0.000001, response_message = response. choices[@].message tool_calls = response_message.tool_calls tool_calls ChatCompletionMessageToolCal1( id="call_4kgTCgvflegSbIMQv4rBXEoi", function=Function( arguments='{"seaso ‘get_quotes” 3, "episode" :4, “character” » type="function", Extraction and Tool Calls The response contains the tool call for the function get_quotes with the specified parameters. You can now extract the function name and arguments and call the function: tool_call = tool_calls[@] function_name = tool_call.function.name function_to_call = available_functions[function_name] function_args = json.1oads(tool_call. function. arguments) function_response = function_to_call(**function_args) Sample function response Mmm, that's where you're wrong. I'm your project supervisor today, and I have just dec And then we checked the fax machine. htips:ihwwwemlexpertoflogiget-40-ap| rer 9115124, 8:91 AM (GP'T-0 API Deep Dive: Text Generation, Vision, ane Functon Calling | MLExpert - Get Things Done with Al Bootcamp [chuckles] He's so cute. Okay, that is a “no” on the on the West Side Market. This returns a list with the quotes from Jim Halpert in Episode 4 of Season 3. You can now messages .append( { "tool_call_id": tool_call.id, tool", function_nane, : function_response, 3 second_response = client. chat. completions. create( (ODEL_NAME, messages=messages, seed=SEED, temperatur 000001, Generate the Final Response Here are three of the funniest quotes from Jim Halpert in Episode 4 of Season 3: 1. **Jim Halpert:** "Mmm, that's where you're wrong. I'm your project supervisor today, and I have just decided that we're not doing anything until you get the chips that you require. So, I think we should go get some. Now, please 2. **Jim Halpert:** "[on phone] Hi, yeah. This is Mike from the West Side Market. Well, we get 2 shipment of Herr's salt and vinegar chips, and we ordered that about three weeks ago and haven't .. . yeah. You have ‘em in the warehouse. Great. What is my store number... six. Wait, no. I'll call you back. [quickly hangs up] Shut up [to Karen]." 3. **Jim Halpert:** "Wow. Never pegged you for a quitter." Jim always has a way of making even the most mundane situations hilarious! This example shows how to use GPT-40 to create Al agents that can interact with external systems and APIs htips:ihwwwemlexpertoflogiget-40-ap| s9r21 9115124, 8:91 AM (GPT-to API Deep Dive: Text Generation, Vision, ané Function Calling | MLExpert -Get Things Done with Al Bootcamp Conclusion Based on my experience so far, GPT-4o offers a remarkable improvement over GPT-4 Turbo, especially in understanding images. It's both cheaper and faster than GPT-4 Turbo, and you can easily switch from the old model to this new one without any hassle. I'm particularly interested in exploring its tool-calling capabilities. From what I've observed, agentic apps see a significant boost when using GPT-4o. Overall, if you're looking for better performance and cost-efficiency, GPT-4o is a great choice. 3,000+ people already joined Join the The State of Al Newsletter Every week, receive a curated collection of cutting-edge Al developments, practical tutorials, and analysis, empowering you to stay ahead in the rapidly evolving field of AL Your Email Address SUBSCRIBE Iwon't send you any spam, ever! References 1. Hello GPT-40 & 2. OpenAl Python Library © 3. tiktoken Python Library & Intps:wwwmlexpertiovloglgot-4o-ap| 2ov2t 9si24, 8:31 AM GPT-4o API Deep Dive: Text Generation, Vision, and Functon Caling | MLExpert - Get Things Done with Al Bootcamp hitpsiwww-mlexper ioblog!gpt-4o-api 21a

You might also like