Datafy Generative-Ai Learning Path

The Generative-AI Learning Path course by Datafy Associates is designed for AI researchers, developers, and enthusiasts, focusing on practical and theoretical aspects of generative AI. It covers topics such as prompt engineering, retrieval augmented generation, fine-tuning large language models, and developing chat applications, with hands-on projects and tools like Hugging Face and Docker. Participants will also engage in a mini project to build an app using the techniques learned throughout the course.

Uploaded by

Muhammad Adnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views7 pages

Datafy Generative-Ai Learning Path

Uploaded by

Muhammad Adnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Datafy Associates: Generative-AI Learning Path

Course Description:
In an era where AI is revolutionizing industries across the globe, Generative AI stands at the forefront of
innovation and disruption in many fields. Whether you are a seasoned professional looking to expand your
skill set or an aspiring enthusiast eager to delve into the world of AI or Student looking to equip yourself
with better job prospects, this course is tailored to meet your needs.
This course will consist of learning theoretical background and diving deep into hands-on implementation
of concepts behind generative AI systems. We focus on text based Generative AI tools & Techniques. The
course is structured in Four Parts.
1. Prompt Engineering & Large Language Models (LLMs)
2. Retrieval Augmented Generation (RAG) Systems
3. Fine Tuning LLMs for Specific Tasks
4. Developing Chat Apps Powered by LLMs and RAGs (Chat with Data etc)
5. Hugging Face Echo System
6. Developing Agents: A Journey Through Innovative AI Tools
7. Mini Project: A Challenge to Build a App using LLMS
Target Audience:
The core audience of this course are AI Researchers & Developers
● Data Scientists | Machine Learning Engineers | Developers | CS Students | AI Enthusiast
Prerequisites:
● Familiarity with Python, GitHub, Google Colab, APIs, Cloud Providers (GCP, AZURE, HF)
● A laptop with a stable internet connection, Access to OpenAI API for hands on, Water Bottle
● Background readings will be provided.
Tools & Technologies:
● Cloud Services: OpenAI, GCP, AZURE, LLM providers (OpenRouter, Cohere, HuggingFace Hub)
● Frameworks: Python, LangChain, LangSmith, VertexAI, PandasAI, liteLLM, Streamlit and many more
● Deployments: Docker, GCP, Free Public Clouds
● Development Environment: Google Colab, Local Setup with and without Docker, Machines with
GPU Support
1.: Prompt Engineering & Large Language Models (LLMs)
Welcome & Introductions
Session 1: Introduction to Generative AI & Prompts, Large Language Models (LLMs)
● Understanding of Generative AI
● Understanding the significance of LLMs in NLP and Background
● Overview of Developing LLMs (Frameworks, Structure, Hugging Face ecosystem)
Session 2: Prompt Engineering
● Prompt Templating & types of prompts
● Hands-on Prompt Engineering
● OpenAI Prompt Specifications
Session 3: Overview & hands on LLM Models
● Overview of LLM Models, Context Length, Tokens (I/O)
● Proprietary Vs Open-Source Models
● Hands-on: LiteLLM, wrapping all models with unified OpenAI Specifications
● Hands-on: OpenAI API using Python.
● Hands-on: LangChain: Models, Memory, Chains and Agents

2.: Retrieval Augmented Generation (RAG) Systems

Welcome & Introductions
Session 1: Introduction to Generative AI Large Language Models (LLMs) & RAGs
● Understanding of Generative AI
● Understanding the significance of LLMs in NLP and Background
● Basics of Retrieval Augmented Generation Systems
● Overview of Developing LLMs (Frameworks, Structure, Hugging Face ecosystem)
Session 2: Overview of a RAG system
● What is the RAG system and its core components?
● Hands-on: A building of complete RAG system
Session 3: Understanding the Core Blocks of RAG Systems
● Hands-on: Data Loaders & Transformations powered by Lang Chain
● Hands-on: Understanding of Vector Embeddings and popular Embeddings Models & APIs
● Hands-on: Exploring the VectorDBs, Storing, Similarity and Retrieval
● Hands-on: Exploring the industry standard Retrieval Methods
● Hands-on: Putting all components together to Build a Working RAG app
3. Fine Tuning LLMs for Specific Tasks
Welcome & Introductions
Session 1: Introduction to Generative AI & Large Language Models (LLMs)
● Understanding of Generative AI
● Understanding the significance of LLMs in NLP and Background
● Overview of Importance of Generating Dataset from Private Knowledge Stores
● Overview of various LLMs, with a focus on GPT-3.5, Claude 3 and Other Varients
● What is fine tuning and when we need to finetune a Model
Session 2: Preparing Your Data for Fine Tuning
● Setting up your development environment using Google Colab
● Preparing the tiny / toy dataset for specific Business Use case from Web, PDF
● Understanding the Creation of fine tuning Datasets compatible with OSS & OpenAI finetuning APIs

Task
● Generate a tiny QA dataset in provided Excel Template
● Convert the data into a fine tuning dataset with provided utilities.
● Push the data to Hugging Face Hub (Optional)
Session 3: Fine-Tuning Hands On
● Fine-tuning the Models with OpenAI Fine-tuning API, with your Tiny Dataset (OpenAI API is
required)
● Fine-tuning the Models with Google Cloud / Azure / OpenAI
Session 4: Fine-Tuning Process Under the Hood
● What is fine-tuning, getting started with hands-on examples
● Data preparation, preprocessing & embeddings
● Choosing a task and dataset for fine-tuning
● Hands on with Fine-tuning components e.g. Data Understanding & basics
How to handle Models with Billions of Parameters for finetuning?
● Fine-tuning Large Models with LoRA (Reduced Dimensionality)
● Fine-tuning Large Models with QLoRA (Reduced Dimensionality)
● Fine-tuning Large Models with AutoTrain (HF Library)
● Advanced (proposed, e.g Deep Speed Zero, LoRA, Flash Attention)
Session 5: Model Evaluation and Validation
● Measuring model performance with Human
● Hands-on exercise: Evaluating your fine-tuned model.
● Deploying your fine-tuned model on HF
● Integrating the model into applications
● Handling real-time requests
Session 7: Best Practices and Discussion
● Importance of having good quality fine-tuning Dataset
● Ethical & security concerns of LLM
4.: Developing Chat Apps Powered by LLMs and RAGs (Chat with Data etc)
Welcome & Introductions
Session 1: Introduction to Generative AI & Large Language Models (LLMs)
● Understanding of Generative AI
● Overview of Developing Chat Apps & Architecture (Chat with Data & Private Knowledge Store)
Session 2: A playground to Develop Data Chat Apps
● `Pandasai` backend: A Google Colab Notebook
● `LangChain` backend: A google Colab Notebooks
● `LangChain` backend: A google Colab Notebooks for RAG system (RECAP)
Session 3: A Streamlit Front End
● What is Streamlit
● Overview of `Chat-with-Data` Streamlit App
● Overview of `Private Knowledge Store ` Streamlit App
● Components of app Building & Frameworks
Session 4: Deployment
● What is Docker & Containerisation of the App with industry best practices?
● Build and Deploy the App in Docker using Docker-Compose
● Deployment to Public Streamlit Cloud
● Deployment to Google Cloud using Cloud Run
4.: Generative AI Hugging Face Echo System
Welcome & Introductions
Explore Hugging Face Ecosystem: NLP models, datasets, and tools. Set up profiles, generate tokens, and
navigate tasks. Highlighted models like gpt2, Llama-2-7b-chat-hf, with details on cards, files, training, and
deployment. Tutorials, docs, spaces, tasks, and community resources provided.
Session 1: Account Setup and Overview of the Platform
● Create and Set up Hugging Face Profile.
● Create API token and connect using Google Colab
● Hugging Face Docs
● Hugging Face Spaces
● Hugging Face Chat
● Hugging Face Community
Session 2: Models and Datasets

• Explore and discuss tasks, libraries, datasets, languages, licenses, sizes, and sub-tasks.
• Notable models include gpt2 and Llama-2-7b-chat-hf, with brief details on Model Cards.
• Files/Versions, Training, Deployment, and Usage in Transformers.
• Tutorials cover deploying LLM in Hugging Face Inference Endpoint.
• Downloading and uploading Finetuned Model to Hugging Face Hub
• Uploading the Finetuned Data
• Making Inference using finetuned model.
5.: Developing Agents: A Journey Through Innovative AI Tools
Welcome & Introductions
Session 1: Chat With Data - Exploring the Fusion of OpenAI and Pandas AI
Overview:
In this session, we delve into the fascinating world of Chat With Data, a groundbreaking chat app that
seamlessly blends OpenAI's natural language processing capabilities with the data management prowess of
Pandas AI. Users are empowered to engage in interactive conversations with their datasets, opening up
new possibilities for efficient data exploration.
Key Features:
● OpenAI for Natural Language Processing
● Pandas AI for Data Querying, Analysis, and Manipulation
● Interactive Conversations for Data Exploration
Session 2: AutoGPT - Revolutionizing Agent Development
Overview:
AutoGPT takes the spotlight in this session as a versatile toolkit designed to elevate agent development. Its
modular framework allows users to focus on building, testing, and monitoring their agents' progress. Join
us in exploring the cutting-edge features of AutoGPT and witness firsthand its impact on the AI revolution.
Key Features:
● Modular Framework for Agent Development
● Building, Testing, and Monitoring Capabilities
● Leading Codebase in the Open-Source Ecosyst
Session 3: GPT Engineer - Crafting Codebases with Intelligence
Overview:
In this session, we turn our attention to GPT Engineer, an ingenious code generation AI. GPT Engineer is
not just a code generator; it's a tool that emphasizes adaptability, extension, and customization. Discover
how users can shape entire codebases based on their preferences, ushering in a new era of intelligent code
construction.
Key Features:
● Adaptive Code Generation
● Emphasis on Extension and Customization
● Shaping Codebases According to User Preferences
Session 4: GPT Researcher - Unleashing Autonomous Agents for Online Research
Overview:
Our final session explores the capabilities of GPT Researcher, an autonomous agent tailored for online
research tasks. This AI marvel produces detailed and unbiased reports with customizable options. Learn
how GPT Researcher prioritizes relevant resources and employs a parallelized work approach for enhanced
speed and stability in online research.
Key Features:
● Autonomous Agent for Online Research
● Detailed and Unbiased Reports
● Customization Options for Research Tasks

Conclusion: Embark on a transformative journey with 'Developing Agents: A Journey Through Innovative
AI Tools.' Explore revolutionary AI tools redefining data interaction, agent development, code generation,
and online research. Join the conversation and witness the era reshaping the boundaries of artificial
intelligence.

6.: Mini Project: A Challenge to Build a App using LLMS

The participants will explore and build a complete app following below steps.
1. Identify a problem which can be solved through one / more of above methods/techniques covered
2. Implement in Google Colab or Local development environment.
3. Package the Core Code into a Python Project hosted on GitHub.
4. Develop Front end Streamlit
5. Deploy the App on Public or Private Clouds