0% found this document useful (0 votes)
61 views67 pages

Generative AI

Uploaded by

d.ramyasri007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views67 pages

Generative AI

Uploaded by

d.ramyasri007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 67

Generative AI

“Generative AI is the most powerful tool for creativity that has ever been
created. It has the potential to unleash a new era of human innovation.”
-Elon Musk

Generative AI finds itself at the intersection of technology and creativity,


and its uses are boundless – many still undiscovered.

Generative Artificial Intelligence is one of the most well-known


contemporary technological innovations. It has created waves across
industries due to its revolutionary and disruptive potential.

Its prevalence in the tech world makes it an essential tool for CSE
students. The best colleges in India have incorporated generative AI into
their curriculum to ensure their graduates are industry-ready.

Generative AI is a type of AI system that can generate new content,


whether text, images, music, or other types of data. These systems are
trained on large datasets and learn styles, patterns and structures from
the data to generate new content similar to what they were trained on.

Generative AI has various applications, including-

 Natural Language Generation


 Image Synthesis
 Creative Content Creation.
 Code Generation And Quality Assurance
 Sales And Advertising
 Graphic Design
 Video Marketing

Generative AI Interfaces

The most famed examples of Generative AI interfaces that most of us are


familiar with are Chat GPT, DALL-E, and GitHub Copilot. Let’s delve deeper
into these and understand their applications.

Chat GPT

ChatGPT is a powerful example of a generative AI interface in natural


language understanding and generation. The tool lets users interact with
the model through text-based conversations, enabling various
applications and use cases. Its user-friendly interface has made it popular
in all segments- from students to IT experts.

As a CSE graduate, exploring the capabilities of GPT-3 for tasks like code
generation, content creation, and language translation could be path-
breaking.
DALL-E

For CSE graduates intrigued by the intersection of AI and creativity, DALL-


E stands as a testament to the endless possibilities awaiting exploration in
the dynamic landscape of Generative AI. The tool can be used in graphic
design and illustration and assist in website development.

GitHub Copilot

GitHub Copilot is a revolutionary tool for Computer Science


Engineering students. It can massively aid students in learning, growing,
and advancing as coders. It has many functions that Btech Computer
Science And Engineering students can utilise. Some of these are-

Faster Coding- Copilot provides real-time code suggestions and


completions as you type, significantly speeding up the coding process.
This feature particularly benefits In Computer Science students grappling
with complex syntax or looking to enhance their coding speed.

Reduced Debugging Time- The tool assists in writing correct and


syntactically accurate code, reducing the likelihood of common errors.
This minimises debugging time, enabling students to concentrate on
building robust and functional applications.

Exposure to Diverse Code Patterns- Copilot exposes students to a diverse


range of code patterns and styles. This exposure is crucial for broadening
their horizons and introducing them to various approaches.

Types of Generative AI Models

Recurrent Neural Networks (RNN)


RNNs, a fundamental generative model, excel in sequence-based tasks,
making them valuable for applications like language generation and music
composition. BTech Computer Science And Engineering students can
leverage RNNs to model and predict patterns in data over time, such as
stock prices, weather conditions, or sensor readings.

Generative Adversarial Networks (GAN)


GANs operate on a competitive basis, with a generator creating content
and a discriminator evaluating its authenticity. This dynamic enhances the
model’s ability to produce realistic outputs.

Computer Science Engineering students with a particular interest in cyber


security will find this model beneficial. GANs can be employed for anomaly
detection by learning the normal patterns within a dataset. This is
relevant in cybersecurity, where identifying unusual patterns can help
detect potential threats and vulnerabilities.

Transformer Models
Transformer models, like OpenAI’s GPT series, have gained prominence
for their attention-based architecture, enabling contextual understanding
and generating coherent and contextually relevant content.

Transformers power question-answering systems, where models can


comprehend and respond to user queries based on context. This is
valuable for Btech Computer Science And Engineering students interested
in developing intelligent chatbots or search engines.

offers several important electives-

 Reinforcement Learning
 Natural Language Processing
 Cognitive Modelling
 Advanced Computer Vision and Video Analytics
 Image and Video Processing

In terms of specialisation in AI, these courses meet the highest worldwide


standards. The University’s ties with NVIDIA, SAP, Hitachi, and Times
Internet allow students to collaborate on cutting-edge projects. The
holistic curriculum and real-world exposure primes students to thrive in
emerging industries like AI.

Conclusion

Generative AI represents a frontier of possibilities for CSE graduates.


Whether you are inclined towards natural language processing, visual
arts, or coding, there’s a space for you in the generative AI landscape.

Students intrigued by AI will find their place at the Best Colleges In India,
training to contribute to the transformative power of AI in the digital era.
Generative AI has the potential to bring in a new era of human innovation,
and while everyone is clamouring to understand it, CSE students have the
opportunity to be a part of this transformation.

https://www.youtube.com/watch?v=aKSVZjti9q0

Identify the impact of generative AI on a few prominent industry sectors.

Generative AI is a groundbreaking force in the technological and business


land scape, leading to a profound transformation across diverse
industries.
As indicated in a report by McKinsey in 2023, Generative AI is projected to
contribute between 2.6 trillion to $4.4 trillion annually to the global
economy, showcasing the immense value it can bring to industries.

The growth trajectory of generative AI across industries is not confined to


mere numbers, but represents a fundamental shift in the way industries
operate.

Generative AI isn't just automating tasks, it's refining workflows from


conceptualizing designs to optimizing processes.

Generative AI is fostering a culture of constant innovation, allowing faster


adaptation to changing needs and market demands.

One of the transformative potentials of generative AI includes hyper


personalization.

Imagine personalized customer experiences, marketing campaigns, and


product recommendations tailored to individual needs.

Generative AI models can analyze vast amounts of data in real time to


predict outcomes, optimize operations, and make informed decisions.

Across different industries, generative AI models are encouraging


collaboration between humans and AI to boost human creativity,
productivity, and efficiency.

Generative AI has a transformative impact on almost every industry.

Let's explore its impact on a few prominent industry sectors.

The impact of generative AI on high tech industries is profound and


multifaceted.

Generative AI is contributing to circuit optimization in semiconductor


companies such as NVIDIA and Qualcomm.

Generative AI is resulting in faster and more efficient chip designs.

Generative AI is making cybersecurity smarter.

It can create attack simulations for training security analysts and testing
system effectiveness, identify threats, and establish predictive models.

Companies such as Palo Alto Networks uses generative AI to detect


anomalies and prevent potential breaches proactively.

For software development, generative AI automates repetitive coding


tasks and aids in testing and debugging.
Generative AI products such as GitHub Copilot can help developers write
code faster, smarter, and more efficiently.

In robotics and automation, generative AI is pivotal in designing and


controlling autonomous robots capable of continuous learning and
improvement.

Companies like Tesla and Boston Dynamics leverage AI to enhance robots'


dexterity and problem-solving skills.

Generative AI is poised to revolutionize the manufacturing landscape.

As per a market research report, the size of the generative AI in the


manufacturing market was 223.4 million US dollars in 2022, and is
projected to exceed 6398.8 million US dollars by 2032.

Generative AI speeds up manufacturing design and fosters innovation by


automating intricate tasks and rapidly creating and evaluating design
alternatives.

Airbus, for instance, reduced aircraft partition wall weight by 45%, using
generative design.

Generative AI can simulate production scenarios, streamline various tasks,


and minimize downtime.

For example, Siemens leverages foundation models to optimize


production processes and improve product quality.

Further, generative AI models can forecast demand and optimize


inventory levels.

Additionally, generative AI can create optimal supply chain models by


considering factors like costs, delivery times, and reliability.

In the finance industry, generative AI exhibits the potential to automate


routine tasks, improve risk mitigation, and streamline financial operations.

According to Goldman Sachs Research, the use of generative AI in finance


is expected to increase global gross domestic product, or GDP, by 7% and
boost productivity by 1.5 percent.

In the banking industry, generative AI has the potential to enhance the


efficiencies already delivered by artificial intelligence.

According to McKinsey, generative AI could deliver value equal to an


additional $200 billion to $340 billion annually if its use cases in banking
are fully implemented.
Generative AI is poised to transform the banking industry by enhancing
personalized customer experiences by predicting customers' financial
needs and behaviors to provide tailored recommendations.

For instance, Morgan Stanley utilizes OpenAI-powered chatbots to aid


their team of financial advisors by analyzing internal research and data for
providing instant personalized insights to clients.

Moreover, generative AI enables more accurate credit risk assessments


through extensive data sets.

Additionally, realtime analysis of transactions and data helps identify


suspicious activity, thereby reducing fraud.

As per McKinsey, the retail and consumer packaged goods, or CPG


industry holds the potential to deliver $400 billion to $660 billion a year by
leveraging generative AI.

For CPG companies, generative AI enables personalized customer journey


and marketing campaigns, streamlined marketing and brand content,
enhanced chatbots, and dynamic pricing.

It can aggregate market data to test concepts, ideas, and models.

CPG companies are increasingly integrating generative AI into their


operations.

For instance, Nestle is using generative AI to validate new product ideas


and create market research reports.

Tapestry, the parent company of brands like Kate Spade and Coach,
utilizes generative AI to automate online personalization.

In the healthcare industry, generative AI is redefining drug discovery,


personalizing medicine, and accelerating medical research.

Pharmaceutical firms usually allocate around 20% of revenues to R and D,


and developing a new drug spans 10 to 15 years on average.

Leveraging foundation models and generative AI can expedite this and


yield substantial value.

According to McKinsey, generative AI could reduce the cost of drug


development by up to 40% and have the time it takes to bring new drugs
to market.

Across different industries, generative AI's potential and impact vary in


different functions.
For instance, in high tech industries, the greatest impact will be on
software engineering, while in banking, it will significantly affect customer
operations.

In the medical field, the impact will be notable in product and R and D.

Marketing and sales functions experience high impact across various


industries.

Approximately 75% of the estimated value derived from generative AI use


cases is concentrated in four key areas, customer operations, marketing
and sales,

software engineering, and research and development.

Apart from the industries covered in this video, generative AI is


influencing other sectors, including media and

entertainment, education, construction, energy, and agriculture.

Generative AI isn't just a trend, it's a driving force behind the next wave of
innovation, efficiency, and personalized experiences across industries.

To summarize, in this video you learned about the impact of generative


AI, leading to a fundamental shift in the way industries operate.

Generative AI is impacting industries to redefine workflows, foster a


culture of innovation, attain hyper personalization,

enable informed decision making, and boost creativity, productivity, and


efficiency.

Generative AI is making a profound and multifaceted impact on different


industries, including high tech, manufacturing, finance, retail, and
healthcare.

Tools for Image Generation


the basic capabilities of generative AI models for image generation and explain
the key capabilities of common models and tools for image generation.
Generative AI image generation models can generate new images and customize real and
generated images to give you the desired output.
For example, you may want to generate an image of a child with a book in her hand.
Further, you may want to change the color of the book cover in the generated image.
Let's generate a new image using a free AI image generator, Freepik.
You need to enter a text prompt describing the image you want to create.
Let's say you enter the following prompt, a boat sailing on a calm lake at sunset, surrounded
by lush greenery and a serene sky.
Remember, how you describe your image and the words you include in the prompt determine
the accuracy and quality of the image that gets generated.
Let's select the style and generate the image.
Here, we have multiple images generated.
You can select and download an image, or you may want to generate other images by
modifying the prompt.
Let's look at some more possibilities of image generation models.
Image to image translation refers to transforming an image from one domain to another while
preserving the original matter and style.
For example, converting sketches to realistic images, converting satellite images to maps,
converting security camera images to higher resolution images, and enhancing detail and
medical imaging.
Style transfer and fusion involve extracting the style from one image and applying it to
another, creating hybrid or fusion images, for example, converting a painting to a photograph.
Inpainting refers to reconstructing missing or damaged parts of an image to make it complete.
You can use this for art restoration, forensics, removing unwanted objects and images while
preserving continuity and context, and blending virtual objects into real world scenes and
augmented reality.
Outpainting involves extending the original image by generating new parts to it that are like
extensions of the original.
This can be used for generating larger images, enhancing resolution, and creating panoramic
views.
The image generation and modifications capabilities of generative models and tools have
evolved with the evolution of models that power them.
OpenAI's DALL-E is based on the GPT model.
Trained on larger datasets of images and their textual descriptions, DALL-E can generate
high resolution images in multiple styles, including photorealistic images and paintings.
DALL-E has evolved in the new versions of DALL-E provide capabilities for generating
multiple image variations and image transformation through in painting and out painting.
Stable Diffusion is an open source text to image diffusion model.
Diffusion models are generative models that can create high resolution images.
Stable Diffusion is primarily used to generate images based on text prompts.
Though it can also be used for image to image translation in painting and out painting.
NVIDIA StyleGAN model separates the modeling of image content and image style, enabling
precise control over style for manipulating specific features like pose or facial expression.
StyleGAN has evolved to generate higher resolution images with more realistic details.
You can explore generative AIs text to image generation capabilities using free tools like
Craiyon, Freepik and Picsart.
These tools can generate images in different forms and styles.
Fotor and Deep Art Effects offer a variety of pretrained styles allowing you to create your
own custom styles.
DeepArt.io is an online platform that turns photos into artwork of different styles.
Midjourney is a platform that enables image generator communities that help artists and
designers to create images using AI and explore each other's creations.
Many generative AI image generators can also be integrated as APIs to embed their
functionality and capabilities into different software programs and tools.
Some popular image generators that offer APIs include DALL-E, Midjourney, and Craiyon.
Technology giants such as Microsoft and Adobe have also stepped into the world of AI image
generators.
Microsoft Bing Image Creator is based on the DALL-E model.
You can access this tool by navigating to bing.com/create or through Microsoft Edge.
This makes Microsoft Edge the first browser with an integrated AI image generator.
Adobe Firefly is a family of generative AI tools designed to integrate with Adobe's Creative
Cloud applications, such as Photoshop and Illustrator.
Firefly is trained on Adobe stock photos openly licensed content and public domain content.
Firefly can take text prompts in over 100 languages and include tools that allow you to
manipulate color, tone, lighting, composition, generative fill, text effects, generative recolor,
3D to image, and extend image.
In this video, you learned that generative AI based models and tools can generate new images
through both text and image prompts.
They also offer capabilities for image to image translation, style transfer in painting or out
painting. A few prominent image generation models include DALL-E, Stable Diffusion, and
StyleGAN.
There are several image generating tools available that offer diverse capabilities for image
generation and transformation.
A few image generators can also be integrated as APIs. You also learned that Adobe Firefly is
a family of generative AI tools designed to integrate with Adobe's Creative Cloud
applications.
Tools For Text Generation
After watching this video, you'll be able describe the basics of text generation through
generative AI.
Explain the key capabilities of common models and tools for text generation.
At the core of the text generation capabilities of generative AI are large language models, or
LLMs.
Based on patterns and structures learned during training, LLMs interpret context, grammar,
and semantics to generate coherent and contextually appropriate text.
Drawing statistical relationships between words and phrases allows LLMs to adapt creative
writing styles for any given context.
LLMs are the basis for many text generation models.
Two such examples are generative pre-trained transformer or GPT and PaLM.
These models have evolved into multimodal models offering multiple capabilities.
Lets learn about the capabilities of these models through two popular ChatGPT and Bard.
ChatGPT is based on GPT as the large language model and uses advanced natural language
processing, or NLP.
While originally Chat GPT only took text prompts as input to generate new contents, with the
newer versions it can take both image and text inputs.
ChatGPT offers diverse capabilities for text generation.
It is capable of smooth and context-based conversations.
Let's start a conversation with ChatGPT to learn a concept, input a prompt that says I've heard
about generative AI and want to learn more.
ChatGPT responds with some basic information based on the context.
When you take the conversation forward to refine the research by asking, how can I use
generative AI to improve my storytelling skills?
As a prompt, ChatGPT provides the response based on the context and question provided by
you.
Feel free to experiment and guide the conversation further.
ChatGPT will build an informative and interesting conversational flow.
It can also help you with varied creative tasks.
Lets enter the prompt, help me to create slides to demonstrate the features of a learning
platform, and ChatGPT comes with suggestions about the title, content and visuals for
specific slides.
Although ChatGPT is most proficient in English, it can understand and respond to several
other languages.
Lets prompt it to write hello in French and Spanish, and it generates the desired output.
ChatGPT can also be a useful tool to assist you in learning a new language or any subject for
that matter.
Another popular text generation tool is Google Bard.
It is based on Googles advanced language models PaLM, short for pathways language model.
PaLM is a combination of transformer model and Google's pathways AI platform.
Path AI is based on pathways, which are specialized modules responsible for a particular task,
such as NLP or machine translation.
In addition to the massive training dataset of text and code, it also pulls information from
sources on the Internet to respond to prompts.
Try experimenting with different prompts to explore the capabilities of Bard.
Let's try bard with a prompt to get a summary of the latest news on a topic, such as provide a
summary of the latest news on the war in Ukraine.
It provides you with multiple drafts as the response, you can select one of these or regenerate.
Next, lets try bar to generate ideas or solve a problem.
Lets prompt it to provide a strategy for a digital marketing campaign for promoting a fashion
brand.
It provides a step by step approach to the marketing campaign.
ChatGPT and Bard offer capabilities for other valuable use cases.
For example, they can help you with basic mathematics, statistics, and problem solving
through these subjects.
They are also proficient in financial analysis, investment research, budgeting and more.
Furthermore, ChatGPT and Bard can generate code and perform code-related tasks across
various programming languages and frameworks.
Having interacted with both ChatGPT and Bard, you'll notice that ChatGPT is more effective
in generating dynamic responses and maintaining conversational flow.
While Bard may be a better choice for researching the latest news or information on a topic,
as it has access to web sources through through Google Search and Google Scholar.
Its important to realize that generative AI models including GPT and PaLM are evolving, so
their capabilities and features may change.
Apart from ChatGPT and Bard, there are other text generators as well.
Jasper, for example, generates high quality marketing content of any length tailored to a
brands voice.
Rytr is a valuable tool for creating high-quality content for blogs, emails, SEO metadata, and
ads on social media.
Also, copy.ai is great for creating content for social media marketing and product
descriptions.
Another tool, Writesonic, offers specific templates for different types of text such as articles
in blogs, ads, and marketing.
There are also tools available for specific use cases.
For example, tools like Resoomer can generate a summary of a text by extracting key ideas or
concepts.
Next, tools like uClassify are used for classification to assign one or more categories to a
snippet of text.
Tools for sentiment analysis understand and generate text that reflects the underlying
emotions expressed in human language,
examples include Brand24 and Repustate.
For multilingual language translation, you can use language Weaver and Yandex.
It is important to note that many of the open source generative AI tools collect and review the
data shared with them to improve their systems.
This is an important consideration for interacting with these tools to avoid sharing any
confidential or sensitive information.
So, do we have open-source privacy-preserving alternatives?
The answer is yes.
GPT4All, for example, can be installed on your machines to run as a privacy aware chatbot
without Internet or a graphics processing unit.
Further, chatbots like H2O.ai and PrivateGPT are designed to protect user privacy by running
on local machines without any Internet connection using the power of LLMs.
Not only that, you can customize these tools for use within a specific organization by linking
them to your organization's documents and databases.
Generative AI based text generators offer several benefits.
These tools are good learning aids as they provide step by step explanations.
They can generate different forms of text quickly, enabling efficiency for writers and creators.
These tools enhance creativity and inspire new ideas by enabling engaging and interactive
conversations, they're useful as virtual assistants and chatbots.
By automating repetitive writing tasks, they can increase productivity for organizations, with
multilingual support, they enable communication and content localization for global
audiences.
In this video, you learned that LLMs interpret context, grammar, and semantics to generate
coherent and contextually appropriate text.
LLMs are the basis for many text generation tools.
Two popular text generation tools are OpenAI's ChatGPT and Google Bard.
ChatGPT is based on GPT and Bard is based on PaLM.
Both ChatGPT and Bard can generate different kinds of text, translate languages, and answer
your questions in an interactive and informative way.
Some of the other tools we talked about include Jasper, Copy.ai, Writesonic.
Open-source privacy-preserving text generators include GPT4All, H2O.ai, and PrivateGPT.

Tools for audio and video generation


to describe how generative AI audio and video tools create impactful media content,
explain the key capabilities of generative AI audio and video tools, explore generative AI's
ability to re imagine virtual worlds.
Market.us estimates that the generative AI music market, valued at 229 million in 2022,
will register a high CAGR of 28.6% to reach 2000 660 million by 2032.
Generative AI music is created using generative AI audio capabilities.
Over the past few years, these capabilities are helping companies and individuals novice or
experienced, simplify their processes to bring their complicated visions to life.
Think about this. Suppose you've been putting off starting your podcast or adding
some sound effects to your remixes.
In that case, you'll love what generative AI audio tools can do for you.
They come in three categories, speech generation tools, music creation tools, and
tools that enhance audio quality.
Speech generation tools are mostly text to speech or TTS tools that convert text into audio.
While read aloud technology is not new, generative AI architecture has upgraded how this
technology works.
Deep learning algorithms are repeatedly trained on vast datasets of human speech.
This allows them to break down and efficiently replicate vocal characteristics, such as
pronunciation, speed, emotion, and intonation.
As a result, generative AI TTS tools create more accurate natural sounding speech, which is
especially helpful to those who struggle with visual impairment,
language barriers, and other reading disabilities.
On the fun side, these tools can help you listen to essays, feedback, and notes, which might
be easier than reading them.
They can also help you communicate better.
What if you wish to narrate your presentation in a standout manner?
You could log into LOVO, Synthesia, Murf.ai, or Listenr, and choose from vast libraries
of AI voices, languages, or emotions.
You could even create a unique voice or clone your voice.
Some tools will also let you edit your vocal tracks pronunciation, tone, and speed to create a
professionally sounding final product.
What about music?
Let's say one sunny afternoon, the amateur musician in you is feeling motivated.
You could try Meta's AudioCraft, the generative AI tool pretrained on sound effects in 20,000
hours of Meta-owned or licensed music.
There's also Shutterstock's Amper music, AIVA, Soundful, Google's Magenta, and the GPT-4-
powered WavTool.
These tools let you choose from extensive music banks, different music genres, instrumental
styles, and melodies.
All you need to do is enter a text prompt.
Based on your request, the tool will write short melodies or riffs, suggest or add instruments,
compose a new song, or create a soundtrack for your next YouTube or Instagram video.
Generative AI can also help you mix, master, and publish your final musical output on
popular streaming platforms.
You can even use audio enhancing tools.
These are pretrained to identify specific sounds and can add fun sounds to your audio or
remove unwanted ones.
For example, Descript can help you remove background noise, enhance low quality
recordings, and add the desired sound effects.
Audo AI cleans your files of unwanted noise.
Many music generation tools also possess audio editing and enhancement capabilities.
However, some projects need more than eclectic sound effects.
In 2022, Runway AI used generative AI capabilities to produce the Oscar winning movie,
Everything Everywhere all at once.
Even if you're not making big cinema, you can use generative AI video tools in your
everyday life.
Let's say you're making a documentary on the lack of trees in your city.
You could log into Runway's Gen-1 tool, which transforms existing video clips into different
styles or use Runway's Gen-2 tool to create a video using text image or video inputs.
Alternatively, you can use the EaseUS video toolkit or the Synthesia app.
These tools will allow you to upload photos.
If you don't have any, use text prompts to generate the images you need.
Additionally, you can use these tools to record a narration, enhance your audio, convert your
video file format, and publish your video.
Synthesia even allows you to create custom avatars to increase your brand recall.
Generative AI can enhance your virtual world experience.
You can create unique imaginative virtual worlds with hybrid characteristics and exotic
landscapes.
Generative models can also respond in real time, improving the accuracy of simulations.
Metaverse platforms employ generative AI to create a more personalized and engaging user
experience.
Gaming metaverses allow you to rapidly generate 3D objects and even create
avatars fitted with specific personality traits that reflect in their expressions, behaviors,
conversations, and decisions.
The Sandbox, for example, is a metaverse where users can instantly build own, and market
their games globally.
Scenario AI helps create and connect customized mobile gaming assets.
In this video, you learned how generative AI audio and video tools can make an impact.
With a simple text prompt, you can produce human-sounding speech in multiple languages,
record songs, add sound effects, or remove unwanted noise, publish professional videos and
animations, build enhanced and exotic virtual worlds.
Deep Learning Basics and Environment Setup

In this chapter, we offer you essential knowledge for building and training deep learning
models, including Generative Adversarial Networks (GANs). We are going to explain the
basics of deep learning, starting with a simple example of a learning algorithm based on
linear regression. We will also provide instructions on how to set up a deep learning
programming environment using Python and Keras. We will also talk about the importance of
computing power in deep learning; we are going to describe guidelines to fully take
advantage of NVIDIA GPUs by maximizing the memory footprint, enabling the CUDA
Deep Neural Network library (cuDNN), and eventually using distributed training setups
with multiple GPUs. Finally, in addition to installing the libraries that will be necessary for
upcoming projects in this book, you will test your installation by building, from scratch, a
simple and efficient Artificial Neural Network (ANN) that will learn from data how to
classify images of handwritten digits.

The following major topics will be covered in this chapter:

 Deep learning basics


 Deep learning environment setup
 The deep learning environment test

Deep learning basics

Deep learning is a subset of machine learning, which is a field of artificial intelligence that
uses mathematics and computers to learn from data and map it from some input to some
output. Loosely speaking, a map or a model is a function with parameters that maps the input
to an output. Learning the map, also known as mode, occurs by updating the parameters of
the map such that some expected empirical loss is minimized. The empirical loss is a measure
of distance between the values predicted by the model and the target values given the
empirical data.
Notice that this learning setup is extremely powerful because it does not require having an
explicit understanding of the rules that define the map. An interesting aspect of this setup is
that it does not guarantee that you will learn the exact map that maps the input to the output,
but some other maps, as expected, predict the correct output.

This learning setup, however, does not come without a price: some deep learning methods
require large amounts of data, specially when compared with methods that rely on feature
engineering. Fortunately, there is a large availability of free data, specially unlabeled, in
many domains.

Meanwhile, the term deep learning refers to the use of multiple layers in an ANN to form
a deep chain of functions. The term ANN suggests that such models informally draw
inspiration from theoretical models of how learning could happen in the brain. ANNs, also
referred to as deep neural networks, are the main class of models considered in this book.

Artificial Neural Networks (ANNs)

Despite its recent success in many applications, deep learning is not new and according to Ian
Goodfellow, Yoshua Bengio, and Aaron Courville, there have been three eras:

Cybernetics between the 1940s and the 1960s

Connectionism between the 1980s and the 1990s

The current deep learning renaissance beginning in 2006

Mathematically speaking, a neural network is a graph consisting of non-linear equations


whose parameters can be estimated using methods such as stochastic gradient descent and
backpropagation. We will introduce ANNs step by step, starting with linear and logistic
regression.

Linear regression is used to estimate the parameters of a model to describe the relationship
between an output variable and the given input variables. It can be mathematically described
as a weighted sum of input variables:

Here, the weight, , and inputs, , are vectors in ; in other words, they are real-valued vectors
with dimensions, as a scalar bias term, and as a scalar term that represents the valuation of
the function at the input . In ANNs, the output of a single neuron without non-linearities is
similar to the output of the linear model described in the preceding linear regression equation
and the following diagram:
Logistic regression is a special version of regression where a specific non-linear function,
the sigmoid function, is applied to the output of the linear model in the earlier linear
regression equation:

The In ANNs, the non-linear model described in the logistic regression equation is similar to
the output of a single neuron with a sigmoid non-linearity in the following diagram:
A combination of such neurons defines a hidden layer in a neural network, and the neural
networks are organized as a chain of layers. The output of a hidden layer is described by the
following equation and diagram:

Here, the weight, , and

You might also like