Promt Engg.
Promt Engg.
• Prompt engineering is the art of asking the right question to get the best output
from an LLM. It enables direct interaction with the LLM using only plain language
prompts.
• In the past, working with machine learning models typically required deep
knowledge of datasets, statistics, and modeling techniques. Today, LLMs can be
programmed in English, as well as in other languages also.
• A large language model (LLM) is a type of artificial intelligence (AI) program that can
recognize and generate text, among other tasks.
• Prompt engineering is the process of structuring text that can be interpreted and understood by
a generative AI model. A prompt is natural language text describing the task that an AI should
perform.
• A prompt for a text-to-text model can be a query, a command, a short statement of feedback or a
longer statement including context, instructions and input data.
• Prompt engineering may involve phrasing a query, specifying a style, providing relevant
context or assigning a role to the AI such as. A prompt may include a few examples for a model
to learn from an approach called few-shot learning.
• When communicating with a text-to-image or a text-to-audio model, a typical prompt is a
description of a desired output. Prompting a text-to-image model may involve adding, removing,
emphasizing and re-ordering words to achieve a desired subject, style, layout, lighting, and
aesthetic.
GENERATIVE ARTIFICIAL
INTELLIGENCE
• Generative artificial intelligence (generative AI, GAI, or GenAI) is artificial intelligence
capable of generating text, images, or other media, using generative models.
• Generative AI models learn the patterns and structure of their input training data and
then generate new data that has similar characteristics.
• In the early 2020s, advances in transformer-based deep neural networks enabled a
number of generative AI systems notable for accepting natural language prompts as
input.
• These include large language model (LLM) chatbots suchas ChatGPT, Copilot, Bard,
and LLaMA, and text-to-image artificial intelligence art systems such as Stable
Diffusion, Midjourney, and DALL-E.
• Generative AI has uses across a wide range of industries, including art, writing, script
writing, software development, product design, healthcare, finance, gaming, marketing, and
fashion.
• Investment in generative AI surged during the early 2020s, with large companies such as
Microsoft, Google, and Baidu as well as numerous smaller firms developing generative AI
models.
• However, there are also concerns about the potential misuse of generative AI,
including cybercrime, the creation fake news, or the production of deepfakes that can be
used to deceive or manipulate people.
GENERATIVE AI, GAI, OR GENAI
TOOLS
• The field of machine learning often uses statistical models,
including generative models, to model and predict data.
• Beginning in the late 2000s, the emergence of deep learning drove progress
and research in image classification, speech recognition, natural language
processing and other tasks.
• In March 2023, GPT-4 was released. A team from Microsoft Research argued that it
could reasonably be viewed as an early (yet still incomplete) version of an artificial
general intelligence (AGI) system.
• Other scholars have disputed that GPT-4 reaches this threshold, calling generative
AI still far from reaching the benchmark of ‘general human intelligence’ as of 2023.
MIDJOURNEY
• The company has been working on improving its algorithms, releasing new model
versions every few months. Version 2 of their algorithm was launched in April 2022 and
version 3 on July 25. On November 5, 2022, the alpha iteration of version 4 was
released to users.
• On March 15, 2023, the alpha iteration of version 5 was released. The 5.1 model is more
opinionated than version 5, applying more of its own stylization to images, while the 5.1
RAW model adds improvement while working better with more literal prompts.
• After version 5.2 is released with a increasingly better image quality. On December 21,
2023, the alpha iteration of version 6 was released. The model was trained from scratch
over a nine month period. Support was added for better text rendition and a more literal
interpretation of prompts.
FUNCTIONALITY
• Midjourney is currently only accessible through a Discord bot on their official Discord
server, by direct messaging the bot, or by inviting the bot to a third party server. To
generate images, users use the /imagine command and type in a prompt the bot then
returns a set of four images. Users may then choose which images they want to upscale.
Midjourney is also working on a web interface.
• Beyond the /imagine command, Midjourney offers many other commands to send to the
Discord bot. Including but not limited to the /blend command which allows the user to
blend two images, the /shorten command allowing the user to get suggestions on how
to make a long prompt shorter, and others which improve upon the Midjourney
experience.
DALL-E
• DALL·E 3 was released natively into ChatGPT for ChatGPT Plus and
ChatGPT Enterprise customers in October 2023, with availability via
OpenAI's API and "Labs" platform provided in early November. Microsoft
implemented the model in Bing's Image Creator tool and plans to
implement it into their Designer app.
TECHNOLOGY
• The first generative pre-trained transformer (GPT) model was initially developed by OpenAI
in 2018, using a Transformer architecture. The first iteration, GPT-1, was scaled up to
produce GPT-2 in 2019, in 2020 it was scaled up again to produce GPT-3, with 175 billion
parameters.
• DALL·E's model is a multimodal implementation of GPT-3 with 12 billion parameters which
swaps text for pixels, trained on text-image pairs from the Internet. In detail, the input to
the Transformer model is a sequence of tokenized image caption followed by tokenized
image patches.
• The image caption is in English, tokenized by byte pair encoding and can be up to 256
tokens long. Each image is a 256x256 RGB image, divided into 32x32 patches of 4x4 each.
Each patch is then converted by a discrete variational autoencoder to a token.
CAPABILITIES