0% found this document useful (0 votes)
30 views14 pages

Promt Engg.

Artificial intelligence (AI) refers to the intelligence exhibited by machines and software, with applications in various fields such as web search, speech understanding, and self-driving cars. Prompt engineering involves crafting effective queries for large language models (LLMs) to generate desired outputs, while generative AI can create text, images, and other media based on learned patterns from training data. Notable generative AI systems include DALL-E and Midjourney, which utilize advanced neural networks to produce high-quality outputs from natural language prompts.

Uploaded by

Aaradhya Dixit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views14 pages

Promt Engg.

Artificial intelligence (AI) refers to the intelligence exhibited by machines and software, with applications in various fields such as web search, speech understanding, and self-driving cars. Prompt engineering involves crafting effective queries for large language models (LLMs) to generate desired outputs, while generative AI can create text, images, and other media based on learned patterns from training data. Notable generative AI systems include DALL-E and Midjourney, which utilize advanced neural networks to produce high-quality outputs from natural language prompts.

Uploaded by

Aaradhya Dixit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

WHAT IS ARTIFICIAL INTELLIGENCE (AI)

• Artificial intelligence (AI) is the intelligence of machines or software, as opposed


to the intelligence of humans or animals. Such machines may be called AIs. It is
a field of study in computer science that develops and studies intelligent
machines.

• AI technology is widely used throughout industry, government, and science.


Some high-profile applications are: advanced web search
engines, recommendation systems, understanding human
speech, generative and creative tools and superhuman play and analysis
in strategy games, self-driving cars.
PROMPT ENGNIEERING

• Prompt engineering is the art of asking the right question to get the best output
from an LLM. It enables direct interaction with the LLM using only plain language
prompts.

• In the past, working with machine learning models typically required deep
knowledge of datasets, statistics, and modeling techniques. Today, LLMs can be
programmed in English, as well as in other languages also.

• A large language model (LLM) is a type of artificial intelligence (AI) program that can
recognize and generate text, among other tasks.
• Prompt engineering is the process of structuring text that can be interpreted and understood by
a generative AI model. A prompt is natural language text describing the task that an AI should
perform.
• A prompt for a text-to-text model can be a query, a command, a short statement of feedback or a
longer statement including context, instructions and input data.
• Prompt engineering may involve phrasing a query, specifying a style, providing relevant
context or assigning a role to the AI such as. A prompt may include a few examples for a model
to learn from an approach called few-shot learning.
• When communicating with a text-to-image or a text-to-audio model, a typical prompt is a
description of a desired output. Prompting a text-to-image model may involve adding, removing,
emphasizing and re-ordering words to achieve a desired subject, style, layout, lighting, and
aesthetic.
GENERATIVE ARTIFICIAL
INTELLIGENCE
• Generative artificial intelligence (generative AI, GAI, or GenAI) is artificial intelligence
capable of generating text, images, or other media, using generative models.
• Generative AI models learn the patterns and structure of their input training data and
then generate new data that has similar characteristics.
• In the early 2020s, advances in transformer-based deep neural networks enabled a
number of generative AI systems notable for accepting natural language prompts as
input.
• These include large language model (LLM) chatbots suchas ChatGPT, Copilot, Bard,
and LLaMA, and text-to-image artificial intelligence art systems such as Stable
Diffusion, Midjourney, and DALL-E.
• Generative AI has uses across a wide range of industries, including art, writing, script
writing, software development, product design, healthcare, finance, gaming, marketing, and
fashion.

• Investment in generative AI surged during the early 2020s, with large companies such as
Microsoft, Google, and Baidu as well as numerous smaller firms developing generative AI
models.

• However, there are also concerns about the potential misuse of generative AI,
including cybercrime, the creation fake news, or the production of deepfakes that can be
used to deceive or manipulate people.
GENERATIVE AI, GAI, OR GENAI
TOOLS
• The field of machine learning often uses statistical models,
including generative models, to model and predict data.

• Beginning in the late 2000s, the emergence of deep learning drove progress
and research in image classification, speech recognition, natural language
processing and other tasks.

• Neural networks in this era were typically trained as discriminative models,


due to the difficulty of generative modeling.
GAI BREAK IN 2014 AND 2017
• In 2014, advancements such as the variational autoencoder and generative adversarial
network produced the first practical deep neural networks capable of learning generative
models, as opposed to discriminative ones, for complex data such as images.
• These deep generative models were the first to output not only class labels for images but
also entire images.
• In 2017, the Transformer network enabled advancements in generative models compared
to older Long-Short Term Memory models, leading to the first generative pre-trained
transformer (GPT), known as GPT-1, in 2018.
• This was followed in 2019 by GPT-2 which demonstrated the ability to generalize
unsupervised to many different tasks as a Foundation model
GAI BREAK IN 2021 AND 2023

• In 2021, the release of DALL-E, a transformer-based pixel generative model,


followed by Midjourney and Stable Diffusion marked the emergence of practical
high-quality artificial intelligence art from natural language prompts.

• In March 2023, GPT-4 was released. A team from Microsoft Research argued that it
could reasonably be viewed as an early (yet still incomplete) version of an artificial
general intelligence (AGI) system.

• Other scholars have disputed that GPT-4 reaches this threshold, calling generative
AI still far from reaching the benchmark of ‘general human intelligence’ as of 2023.
MIDJOURNEY

• Midjourney is a generative artificial intelligence program and service created and


hosted by San Francisco–based independent research lab Midjourney, Inc.
Midjourney generates images from natural language descriptions, called prompts,
similar to OpenAI's DALL-E and Stability AI's Stable Diffusion.

• It is one of the technologies of the AI Spring.The tool is currently in open beta,


which it entered on July 12, 2022. The Midjourney team is led by David Holz, who
co-founded Leap Motion. Holz told The Register in August 2022 that the company
was already profitable. Users create artwork with Midjourney using Discord bot
commands.
MODEL VERSIONS

• The company has been working on improving its algorithms, releasing new model
versions every few months. Version 2 of their algorithm was launched in April 2022 and
version 3 on July 25. On November 5, 2022, the alpha iteration of version 4 was
released to users.
• On March 15, 2023, the alpha iteration of version 5 was released. The 5.1 model is more
opinionated than version 5, applying more of its own stylization to images, while the 5.1
RAW model adds improvement while working better with more literal prompts.
• After version 5.2 is released with a increasingly better image quality. On December 21,
2023, the alpha iteration of version 6 was released. The model was trained from scratch
over a nine month period. Support was added for better text rendition and a more literal
interpretation of prompts.
FUNCTIONALITY

• Midjourney is currently only accessible through a Discord bot on their official Discord
server, by direct messaging the bot, or by inviting the bot to a third party server. To
generate images, users use the /imagine command and type in a prompt the bot then
returns a set of four images. Users may then choose which images they want to upscale.
Midjourney is also working on a web interface.

• Beyond the /imagine command, Midjourney offers many other commands to send to the
Discord bot. Including but not limited to the /blend command which allows the user to
blend two images, the /shorten command allowing the user to get suggestions on how
to make a long prompt shorter, and others which improve upon the Midjourney
experience.
DALL-E

• DALL·E, DALL·E 2, and DALL·E 3 are text-to-image models developed by


OpenAI using deep learning methodologies to generate digital images
from natural language descriptions, called "prompts".

• DALL·E 3 was released natively into ChatGPT for ChatGPT Plus and
ChatGPT Enterprise customers in October 2023, with availability via
OpenAI's API and "Labs" platform provided in early November. Microsoft
implemented the model in Bing's Image Creator tool and plans to
implement it into their Designer app.
TECHNOLOGY
• The first generative pre-trained transformer (GPT) model was initially developed by OpenAI
in 2018, using a Transformer architecture. The first iteration, GPT-1, was scaled up to
produce GPT-2 in 2019, in 2020 it was scaled up again to produce GPT-3, with 175 billion
parameters.
• DALL·E's model is a multimodal implementation of GPT-3 with 12 billion parameters which
swaps text for pixels, trained on text-image pairs from the Internet. In detail, the input to
the Transformer model is a sequence of tokenized image caption followed by tokenized
image patches.
• The image caption is in English, tokenized by byte pair encoding and can be up to 256
tokens long. Each image is a 256x256 RGB image, divided into 32x32 patches of 4x4 each.
Each patch is then converted by a discrete variational autoencoder to a token.
CAPABILITIES

• DALL·E can generate imagery in multiple styles,


including photorealistic imagery paintings, and emoji. It can "manipulate
and rearrange" objects in its images, and can correctly place design
elements in novel compositions without explicit instruction.
• DALL·E can produce images for a wide variety of arbitrary descriptions from
various viewpoints with only rare failures. Mark Riedl, an associate
professor at the Georgia Tech School of Interactive Computing, found that
DALL-E could blend concepts.
• Its visual reasoning ability is sufficient to solve Raven's Matrices. DALL·E 3
follows complex prompts with more accuracy and detail than its
predecessors, and is able to generate more coherent and accurate text.
DALL-E 3 integrates into ChatGPT.

You might also like