Datafy Generative-Ai Learning Path
Datafy Generative-Ai Learning Path
Course Description:
In an era where AI is revolutionizing industries across the globe, Generative AI stands at the forefront of
innovation and disruption in many fields. Whether you are a seasoned professional looking to expand your
skill set or an aspiring enthusiast eager to delve into the world of AI or Student looking to equip yourself
with better job prospects, this course is tailored to meet your needs.
This course will consist of learning theoretical background and diving deep into hands-on implementation
of concepts behind generative AI systems. We focus on text based Generative AI tools & Techniques. The
course is structured in Four Parts.
1. Prompt Engineering & Large Language Models (LLMs)
2. Retrieval Augmented Generation (RAG) Systems
3. Fine Tuning LLMs for Specific Tasks
4. Developing Chat Apps Powered by LLMs and RAGs (Chat with Data etc)
5. Hugging Face Echo System
6. Developing Agents: A Journey Through Innovative AI Tools
7. Mini Project: A Challenge to Build a App using LLMS
Target Audience:
The core audience of this course are AI Researchers & Developers
● Data Scientists | Machine Learning Engineers | Developers | CS Students | AI Enthusiast
Prerequisites:
● Familiarity with Python, GitHub, Google Colab, APIs, Cloud Providers (GCP, AZURE, HF)
● A laptop with a stable internet connection, Access to OpenAI API for hands on, Water Bottle
● Background readings will be provided.
Tools & Technologies:
● Cloud Services: OpenAI, GCP, AZURE, LLM providers (OpenRouter, Cohere, HuggingFace Hub)
● Frameworks: Python, LangChain, LangSmith, VertexAI, PandasAI, liteLLM, Streamlit and many more
● Deployments: Docker, GCP, Free Public Clouds
● Development Environment: Google Colab, Local Setup with and without Docker, Machines with
GPU Support
1.: Prompt Engineering & Large Language Models (LLMs)
Welcome & Introductions
Session 1: Introduction to Generative AI & Prompts, Large Language Models (LLMs)
● Understanding of Generative AI
● Understanding the significance of LLMs in NLP and Background
● Overview of Developing LLMs (Frameworks, Structure, Hugging Face ecosystem)
Session 2: Prompt Engineering
● Prompt Templating & types of prompts
● Hands-on Prompt Engineering
● OpenAI Prompt Specifications
Session 3: Overview & hands on LLM Models
● Overview of LLM Models, Context Length, Tokens (I/O)
● Proprietary Vs Open-Source Models
● Hands-on: LiteLLM, wrapping all models with unified OpenAI Specifications
● Hands-on: OpenAI API using Python.
● Hands-on: LangChain: Models, Memory, Chains and Agents
Task
● Generate a tiny QA dataset in provided Excel Template
● Convert the data into a fine tuning dataset with provided utilities.
● Push the data to Hugging Face Hub (Optional)
Session 3: Fine-Tuning Hands On
● Fine-tuning the Models with OpenAI Fine-tuning API, with your Tiny Dataset (OpenAI API is
required)
● Fine-tuning the Models with Google Cloud / Azure / OpenAI
Session 4: Fine-Tuning Process Under the Hood
● What is fine-tuning, getting started with hands-on examples
● Data preparation, preprocessing & embeddings
● Choosing a task and dataset for fine-tuning
● Hands on with Fine-tuning components e.g. Data Understanding & basics
How to handle Models with Billions of Parameters for finetuning?
● Fine-tuning Large Models with LoRA (Reduced Dimensionality)
● Fine-tuning Large Models with QLoRA (Reduced Dimensionality)
● Fine-tuning Large Models with AutoTrain (HF Library)
● Advanced (proposed, e.g Deep Speed Zero, LoRA, Flash Attention)
Session 5: Model Evaluation and Validation
● Measuring model performance with Human
● Hands-on exercise: Evaluating your fine-tuned model.
● Deploying your fine-tuned model on HF
● Integrating the model into applications
● Handling real-time requests
Session 7: Best Practices and Discussion
● Importance of having good quality fine-tuning Dataset
● Ethical & security concerns of LLM
4.: Developing Chat Apps Powered by LLMs and RAGs (Chat with Data etc)
Welcome & Introductions
Session 1: Introduction to Generative AI & Large Language Models (LLMs)
● Understanding of Generative AI
● Overview of Developing Chat Apps & Architecture (Chat with Data & Private Knowledge Store)
Session 2: A playground to Develop Data Chat Apps
● `Pandasai` backend: A Google Colab Notebook
● `LangChain` backend: A google Colab Notebooks
● `LangChain` backend: A google Colab Notebooks for RAG system (RECAP)
Session 3: A Streamlit Front End
● What is Streamlit
● Overview of `Chat-with-Data` Streamlit App
● Overview of `Private Knowledge Store ` Streamlit App
● Components of app Building & Frameworks
Session 4: Deployment
● What is Docker & Containerisation of the App with industry best practices?
● Build and Deploy the App in Docker using Docker-Compose
● Deployment to Public Streamlit Cloud
● Deployment to Google Cloud using Cloud Run
4.: Generative AI Hugging Face Echo System
Welcome & Introductions
Explore Hugging Face Ecosystem: NLP models, datasets, and tools. Set up profiles, generate tokens, and
navigate tasks. Highlighted models like gpt2, Llama-2-7b-chat-hf, with details on cards, files, training, and
deployment. Tutorials, docs, spaces, tasks, and community resources provided.
Session 1: Account Setup and Overview of the Platform
● Create and Set up Hugging Face Profile.
● Create API token and connect using Google Colab
● Hugging Face Docs
● Hugging Face Spaces
● Hugging Face Chat
● Hugging Face Community
Session 2: Models and Datasets
• Explore and discuss tasks, libraries, datasets, languages, licenses, sizes, and sub-tasks.
• Notable models include gpt2 and Llama-2-7b-chat-hf, with brief details on Model Cards.
• Files/Versions, Training, Deployment, and Usage in Transformers.
• Tutorials cover deploying LLM in Hugging Face Inference Endpoint.
• Downloading and uploading Finetuned Model to Hugging Face Hub
• Uploading the Finetuned Data
• Making Inference using finetuned model.
5.: Developing Agents: A Journey Through Innovative AI Tools
Welcome & Introductions
Session 1: Chat With Data - Exploring the Fusion of OpenAI and Pandas AI
Overview:
In this session, we delve into the fascinating world of Chat With Data, a groundbreaking chat app that
seamlessly blends OpenAI's natural language processing capabilities with the data management prowess of
Pandas AI. Users are empowered to engage in interactive conversations with their datasets, opening up
new possibilities for efficient data exploration.
Key Features:
● OpenAI for Natural Language Processing
● Pandas AI for Data Querying, Analysis, and Manipulation
● Interactive Conversations for Data Exploration
Session 2: AutoGPT - Revolutionizing Agent Development
Overview:
AutoGPT takes the spotlight in this session as a versatile toolkit designed to elevate agent development. Its
modular framework allows users to focus on building, testing, and monitoring their agents' progress. Join
us in exploring the cutting-edge features of AutoGPT and witness firsthand its impact on the AI revolution.
Key Features:
● Modular Framework for Agent Development
● Building, Testing, and Monitoring Capabilities
● Leading Codebase in the Open-Source Ecosyst
Session 3: GPT Engineer - Crafting Codebases with Intelligence
Overview:
In this session, we turn our attention to GPT Engineer, an ingenious code generation AI. GPT Engineer is
not just a code generator; it's a tool that emphasizes adaptability, extension, and customization. Discover
how users can shape entire codebases based on their preferences, ushering in a new era of intelligent code
construction.
Key Features:
● Adaptive Code Generation
● Emphasis on Extension and Customization
● Shaping Codebases According to User Preferences
Session 4: GPT Researcher - Unleashing Autonomous Agents for Online Research
Overview:
Our final session explores the capabilities of GPT Researcher, an autonomous agent tailored for online
research tasks. This AI marvel produces detailed and unbiased reports with customizable options. Learn
how GPT Researcher prioritizes relevant resources and employs a parallelized work approach for enhanced
speed and stability in online research.
Key Features:
● Autonomous Agent for Online Research
● Detailed and Unbiased Reports
● Customization Options for Research Tasks
Conclusion: Embark on a transformative journey with 'Developing Agents: A Journey Through Innovative
AI Tools.' Explore revolutionary AI tools redefining data interaction, agent development, code generation,
and online research. Join the conversation and witness the era reshaping the boundaries of artificial
intelligence.