0% found this document useful (0 votes)
21 views102 pages

EECS6895 AdvancedBigDataAnalytics Lecture1

Uploaded by

white field
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views102 pages

EECS6895 AdvancedBigDataAnalytics Lecture1

Uploaded by

white field
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

E6895 Advanced Big Data Analytics and AI Lecture 1:

Introduction of Advanced Big Data and AI

Ching-Yung Lin, Ph.D.


Adjunct Professor, Depts. of Electrical Engineering and Computer Science

January 19, 2024


E6895 Advanced Big Data and AI — Lecture 1 © CY Lin, 2024 Columbia University
1997
Agenda:
•Introduction of IBM System G
•Answering the Questions raised by FINRA
•Large-Scale Graph Computing:
•System G Graph Database
•System G Graph Analytics
•Demo of System G Graph Tools
•Relationship Extraction
•Machine Reasoning
•Discussion

2 Network Science Team © 2013 IBM Corporation


2011 — 1997
Jeorpady

3
2 Network Science Team © 2013 IBM Corporation
2015

4 Network Science Team © 2013 IBM Corporation


5 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Perception
Classification
Reasoning &
Strategy
Observation
Memory

46 Network Science Team © 2013 IBM Corporation


Do we need AI?

Who will be our caregiver? Where to find Helps?

“Single Child” All Developed and some developing


Finalist of the 26th National Photo countries have been facing labor
Contest, China shortage crisis è More and more
serious everyday.

7 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
https://www.youtube.com/watch?v=BV8qFeZxZPE

8 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
2023 AI Summit at New York Javits Center
Graphen: The largest booth

9 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
World’s First AI Digital Human for Daily Life

Meet Aiia
• Hardware-Software Integrated Local AI ‘Brain’.
• Privacy / Individual / Personal
• Speaks English, Chinese, Japanese, and Spanish
• Avatars with Personality & Emotion
• Eye Contact / Facial Expression
• Integrating with Payment, Mobile Apps, etc.

10 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Graphen Robotics Hardware Systems

Aiia Kiosk Aiia Know Aiia Robot (Adam)

32” 43” and 55”; Classic and Glass 32”

11 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Digital Human Application Scenarios

Drinks, Restaurants, Supermarket, etc. Hotel, Train Stations, Travel


Retail stores
Agent
Six Ava demos at New York Convention Center (April 2023 @ NY Auto Show)

Hospital, Nursing Home Financial Institutes Automotives

12 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
AI Knowledge Worker
Examples:

Instant reference tool for medication dosages, side


effects, and interactions, reducing the risk of
medication errors.

Patient education : helping nurses provide accurate,


understandable explanations of medical conditions
and treatments.

Question : What is the infusion time for 1 unit of Packed Red Blood Cells?
Aiia Nurse Assistant: PBRCs are a blood product used to replace
erythrocytes; infusion time for 1 unit is usually between 2 and 4 hours.
Source: The answer is obtained by retrieving page 158 in the provided PDF, which is the RN Exam textbook.

è Aiia answered 90% questions correctly in New York RN License Exam

13 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Digital Human Application Scenarios

Aiia Examples

Aiia Financial Advisor

Retail Aiia

14 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Digital Human Application Scenarios

Aiia Examples

Aiia as Sales Assistant

Aiia
helps
ordering

15 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Nurse Aiia

① Patient Care:
Ø Do Routine Works for Nurse: discharging info, facility
info, forms to fill, patient and caregiver educations, social
worker roles, etc.
Ø Provide health-information on how to cope with the
situation by medical information, health information etc.,
Ø Provision of information on economics, travel, hobbies
and preferences through conversations between patients
and avatars
Ø Brain vitalization through enhanced daily conversation,
maintenance and promotion of health, and support as a
"personal companion”
Ø Entertainment
② Operational Support for Nurse Station:
Ø Patient Personal information (including gene
information, treatment history, drug administration
information, etc.)
Ø Real-time sensor data monitoring of patients (body
temperature, blood pressure, sleep, turnover status,
awake status, etc.)

16 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Course Outline

Presentation Schedule shall be adjusted by on the distribution of tasks.

17 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Course Grading
▪ Task: 45%
— Teamwork: 1 - 2 students per team
• Choose a task from 60 potential tasks
• Language Requirement: Python, JavaScript, C/C++, Java, Perl
• 3 milestones (45%): Presentation, Slides, Report and Source Code

▪ Final Project: 30%


— Teamwork: 1 - 2 students per team
• Building System
• Final Report (paper, up to 12 pages)
• Workshop Presentation and Online Video
• Open Source

• Research Study: 15%


• 3 research paper presentations related to Advanced AI: Slides

▪ Class Participation: 10%


• Attendance Task Sign-Up Spreadsheet is
• Discussion (Asking/Answering Questions) available until midnight 1/26

18 © 2024 CY Lin, Columbia University


E6895 Advanced Big Data and AI – Lecture 1: Overview
Course Information
▪ Website:
http://www.ee.columbia.edu/~cylin/course/bigdata/

▪ Textbook:
-- None, but reference book(s) and/or articles/papers will be provided each lecture.

19 © 2024 CY Lin, Columbia University


E6895 Advanced Big Data and AI – Lecture 1: Overview
Contact Information and TAs

▪ Professor Lin:
▪ Office Hours and Location:
Friday 9:30pm – 10:00pm (lecture room) or by appointment (500 Fifth
Ave., Suite 2420, New York, NY 10110)
▪ Contact: [email protected]

▪ TAs:
• Shiyu Wang (sw3601)
• TBD

Special Request — thanks:

• If you may not take the class, please do not sign up the task spreadsheet.
• Please remove your name from the task sign-up sheet immediately when you drop the
class.
• Please drop the class as early as possible, if you are not planning to take the class.

20 © 2024 CY Lin, Columbia University


E6895 Advanced Big Data and AI – Lecture 1: Overview
Reference Book

21 © 2024 CY Lin, Columbia University


E6895 Advanced Big Data and AI – Lecture 1: Overview
Reference Book

22 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Reference Foundation

• Graph Middleware: • Machine Learning: • Machine Reasoning:


• Parallel Prog. Lib. • Deep Learning Tools • Bayesian Networks
• Power Optimization • Visual and Text Sentiment Tools • Game Theory Tools
• GPU Optimization • Anomaly Detection Tools • Multimodal Analysis Platform
• Mobile Cognition:
• Graph Analytics: • iOS Cognition Tools
• Topological Analysis • Robot Cognition Tools 4. Machine Reasoning
• Matching and Search Technologies
• Path and Flow 3. Machine Learning
Technologies
• Spatiotemporal Analytics:
• Spatiotemporal Mining Judgment
• Spatiotemporal Indexing Perception &
2. Network Analytics Representation Reasoning &
Technologies Strategy
• Graph Database:
• Native Store Sensing &
Memory
• GBase Observation

• Graph Visualization:
• Multivariate Graph 1. Graph Database
• Dynamic Graph Technologies
• Big Graph

23 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Reference Advanced AI + Big Data Platform

•Terabyte-sized native GraphDB,


supports trillion of vertices and
edges
•ACID-compliant and distributed
Graph database and analytics
•Asynchronous job scheduling
(both Autonomous ML and
GraphDB)
•Scalable, distributed Analytics,
modular and expandable through
plugins
•Cluster, Replication and High-
Availability with disaster recovery
•Error and event Logging,
Monitoring, Backup and Recovery

24 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Task Area 1: Full-Brain Machines

25 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Area 1 ‘Cognitive Machine’ Tasks List:
A1: Deep Video Understanding (Visual + Knowledge) — Face Recognition, Feeling
Recognition, and Interaction
A2: Deep Video Understanding (Language + Knowledge) — Speech Recognition,
Gesture Recognition, and Feeling Recognition
A3: Deep Video Understanding — Event and Story Understanding
A4: Humanized Conversation — Personality-Based Conversations
A5: Autonomous Robot Learning of Physical Environment
A6: Autonomous Task Learning via Mimicking
A7: Digital Human for Fashion
A8: Digital Human for Tourism
A9: Digital Human for Retail
A10: Digital Human for Media and Marketing
A11: Feeling and Art Recognition
A12: Creative Writing & Story Telling
A13: Knowledge Learning & Construction
A14: Dreams — Simulating Brain functions while sleeping
A15: Self-Consciousness, Ethics, and Morality

26 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Robot Cognition • Audio
• Visual
• Infra-red

n sor Multi-Modality
Se Information
g Emotion
n d in
Perception
de rsta
Un
Artificial Empathy

Expression
Emotion
Contagion
M
ot Motor
io n Mimicry

27 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Emotion and Cheers

28 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
How Robot cheers you up

29 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
A1-A3. Deep Video Understanding
A complete system combining these for video understanding

Potentially Target at NIST Deep Video Understanding 2022


30 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
A4. Humanized Conversation with Personality

Description:
• Virtual Agents are progressing fast
and entering people’s life. However, the
voice presented by the agents are
mostly ‘flat’ — like machines.
• The first step to make virtual agents to
be like human is to add the “personality”
aspect in conversation.

Goal:
• Create Personality-based Speaking
Model Text for Conversation

Advanced Goal:
• Modify the Speech Tones to reflect
Personality

31 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
A5. Autonomous Learning of Physical Environments
Description:
• Simultaneous Localization and Mapping
(SLAM) refers to the problem of
incrementally building the map of a
previously unseen environment while at the
same time locating the robot on it.
• Active localization was proven that picking
actions to minimize the localization’s
uncertainty would result in a better
localization than using a passive approach.
• Active SLAM augments this approach to
the SLAM problem, and it can be defined
as the paradigm of controlling a robot which
is performing SLAM to reduce the
uncertainty of its localization and the map’s
representation

Goal:
• Robot Awareness of Physical
Environments
• Robot Action with Environments

32 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
A6. Autonomous Learning of Tasks via Mimicking

Description:
• Machine learning to act based on
actions of human
• Watch how human activity in an
environment and then learn how to
behave by itself.

Goal:
• Observation and Action Extraction
• Reinforcement Learning to correct
own actions

33 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
A7 - A10. Digital Human

34 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
A7. Digital Human for Fashion Industry

A8. Digital Human for Tourism Industry

A9. Digital Human for Retail Industry

A10. Digital Human for Media and Marketing


Industry

• Learning Industry Knowledges


• Local ‘Brain’.
• Integrating with Mobile Apps.
• Multi-Languages
• Avatars with Personality & Emotion
• Reconstructing and Connecting with Real-World
Objects
• (Optional) Utilizing with Physical Robotics

35 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
A11. Feeling and Art Recognition
• Background
• Let machine to feel and
appreciate arts like human

• Project Goal
• A team will work on the
subjective machine feeling of
visual information
• Allow machines to interpret
arts.

36 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
A12. Creative Writing and Story Telling

• Background
• Overwhelming real-time information on
media.
• Automatic writing and telling a story
based a set of news articles.

• Project Goal
• A team will design and implement a
platform that conducts data mining on
various related media of a field.
• Using NLP to summarize key text
information.
• Using visualization to create charts and
graphs.
• Automatically create descriptions

37 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
A13. Knowledge Learning and Construction

“Airplane”
“Grandma”
“Grandma is in Taiwan”
“Auntie is also in Taiwan”
“I like grandma”
“I like grandpa”
The boy said:
“I like grandma and grandpa”

Image Source: http://wonderforgood.com/category/visual-storytelling/

38 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Memory with Knowledge Graphs

“see” “visit”
“in”
“Airplane” “Grandma”

“Taiwan”
"like” “and”

“in”

“I”

“like”
Graph platform works like the
human mind, connecting the
dots when comprehending. “Grandpa” “Auntie”
Image Source: http://wonderforgood.com/category/visual-storytelling/

39 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
A14. Dreams Simulating Brain Functions while Sleeping

• Background
• When human sleeps, our
brain works on ‘storing’ the
massive information we see ,
hear, and learn during the
day time into ‘storage’
• Our brain would later on
organize (in a bizarre way) to
create dreams.

• Project Goal
• A team will work on
simulating how brain
functions during sleep.
• Create ‘dreams’,

40 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
A15. Self-consciousness, Ethics and Morality

• Background
• Consciousness is how robots
know its own existence
• Can robot has self-
identification?

• Project Goal
• Simulates empathy
• Simulates ethics and morality

41 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Task Area 2: Financial Advisors
Market Data Analysis and Investment Targets
Advanced Dynamic ‘Know Your Customer’
Optimized Personalized Investment Strategy
Bank-Customer Interaction Strategy

High
High End Customers(Private Bank /
Special Investment Services)
Mass Affluent

Upper Middle
Targeted Customers (Consumer Bank
Services) : $15K - $1M
Middle (Customer #: 30M~50M in China)

Lower Middle General Public(Consumer Bank Services)


(Customer #:> 1B in China)

42 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Area 2 ‘Finance Advisor’ Tasks List:
B1: Market Intelligence — Constructing Financial Knowledge Graphs
B2: Market Intelligence — Company Environmental, Societal, and Governance
Performance
B3: Market Intelligence — Event Linkage and Impact Prediction
B4: Market Intelligence — Alpha Generation from Alternative Sources
B5: Advance KYC — Customer Profiling based on Personality, Needs, and Value
B6: Advanced KYC — Customer Behavior Prediction
B7: Investment Strategy — AI Trader (Foreign Exchange)
B8: Investment Strategy — AI Trader (Stock Markets)
B9: Investment Strategy — Automatic Dynamic Asset Allocation
B10: Customer Interaction — Customer Communication Strategies
B11: Customer Interaction — Insurance Product Sales & Marketing Strategy
B12: Automatic Story Telling for Marketing
B13: Automatic Market Competition Analysis
B14: Automatic Consumer Sales Leads Finding
B15: Human Capital Growth Recommendations

43 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
What is Robo-Advisor?
Robo-Advisor is a new type of wealth
management service. Based on the risk level ▪ Non-biased
and investment goals provided by the
investor, and it uses a series of ‘smart ▪ Low investment threshold
algorithm’ to calculate the optimal investment
suggestions. ▪ Low starting entry money

Robo-advisors directly managed about $19 billion ▪ Low agent fee


as of December 2014. By 2020 the global
assets under management of robo-advisers is
forecast to grow to an estimated US$255B.

Features:
• Strongly depend on technology,
algorithm and financial theory

• Distributed investment, maximum


long-term return

• Personalized portfolio allocation.

Example: Harry Markowitz Theory

44 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Typical Steps of Robo-Advisory

Most of the robo-advisor platform is built based on the modern investment portfolio theory, using
Exchange Trade Funds (ETFs) to build portfolio.

Customer Construct Tracing Receiving


Rebalance
Profiling Portfolio Portfolio Benefits

- design - portfolio - Monte Carlo - Saving tax - set tolerance


questionnaire; strategy; Simulation through the loss level to avoid
- Score Risk - Judge to compensate over adjustment
- type
Capacity and whether the goal the gains;
Risk Willingness analysis; is achieved - outcome is
based on the - optimum - Suggest highly related to
answers of the allocation; adjustments; the income;
questionnaire. - Investment
income tax (not
applicable in
China)

Based on a survey of Wells Fargo, in US, there is only 16% of population in their 20s and 30s are willing to interact
with investment consultants. The remaining people prefer to use these types of AI consultant.

45 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Four Steps to use Big Data Cognitive Analysis for Robo-Advisor
Optimized Precise Bank-
Investment Market Dynamically Know
Personalized Customer
Analysis Your Customer
Investment Strategy Interaction
• Analyze the market
performance of • Customer Profiling, • Strategy computation • Create and predict
various kinds of funds e.g, based on IPQ( and optimization based customer interaction
• Analyze domestic and Individual Profile on personal history strategy, including
international financial Questionnaire), • Demonstrate / Simulate when, method,
and economic Feedback, Risk ‘what ifs’ when the content to interact
changes and how Capacity and Risk portfolio has different with customer – to
they may impact CPI, Willingness allocation. achieve max customer
PPI, or GDP. • Understand what the • Explainability of ‘what and bank benefit.
• Use Machine Learning customer really ifs’ to customer to the
and Deep Learning, wants based on their customer.
based on historical past behaviors Data
economic numbers, interacting with bank Data • Customer Data
find out how factors • Customer Data • Interaction Data
impact financial Data • Market Data
markets. • Customer Data
Data • Behavior Data /
• Product Data Interaction Data
• Market Data
• Historical Economic Data
• Industry-related Data

46

E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
B1 - B4: Market Intelligence

Impact score of news


on each stock

News to be
analyzed

47 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Knowledge Graphs

48 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Example of Building and Utilizing Knowledge Graph

• Background
• For artificial intelligence and better search,
many search companies have created a
knowledge graph.
• However there are few knowledge graphs in
the public domain.
• Project Goal
• A team will create knowledge graphs in
several application domains (e.g., Finance,
Medical, etc) by crawling public web pages,
news, Twitter, Wikipedia, etc.
• A team will need to design the way of
efficiently crawling data set, store them in a
limited space, and quickly searching for
required data set with the indexing
functionality.

49 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
B5 - B6: Advanced KYC — Customer Profiling and Behavior Prediction

50 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
B7 - B9: AI Trader

36 E6895 Advanced Big Data and AI — Lecture 1


51
© CY Lin 2024, Columbia University
AI Trader

36 E6895 Advanced Big Data and AI — Lecture 1


52
© CY Lin 2024, Columbia University
B10 - B11: Customer Interaction

Hundreds of Telesales,
products/campaigns Mail, email,
Combinations with Office, etc…
incompatibilities
Done through which
How much of each
channel?
product/campaign ?

Nightly batch run, Experts doing what-if


select over 1.2M to improve process

To which customers? When?


Several millions of Select actions for the
customers next days

36 E6895 Advanced Big Data and AI — Lecture 1


53
© CY Lin 2024, Columbia University
B12. Automatic Story Telling for Marketing

• Background
• Using Raw Materials in an organization
to create marketing materials

• Project Goal
• A team will design and implement
platform that uses data in an
organization
• Automatically organize the information
on a particular topic
• Using visualization to create charts and
graphs.
• Manually or automatically create
descriptions
• Creates a video to tell story

54 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
B13. Automatic Market Competition Analysis

• Background
• Automatic searching internet to find
competitor’s information

• Project Goal
• Automatic extraction of key information
• Automatic compare key products and
services of the company
• Finding financial performance if those
are available.

55 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
B14. Automatic Finding Sales Leads
• Project Goal
• Using Public Raw Materials on Social
Media to find potential customers

56 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
B15. Human Capital Growth Recommendation
• Project Goal
• Automatic analyzing a person’s personality and goals
• Analyzing similar successful people from public datasources, e.g., LinkedIn.
• Creating Knowledge Graphs that makes successful on goals
• Suggesting what to learn to be competitive

57 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Task Area 3: Healthy Life

58 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Area 3 ‘Healthy Life’ Tasks List:
C1: Precision Health — Gene and Protein Analysis of Network, Pathway, and
Biomarkers
C2: Large-Scale System for Human Genome Analysis
C3: Genomic Mutations and Function Prediction
C4: Druggable Targets for Precision Medicine
C5: AI for Human Consciousness – EEG and AIoT
C6: AI for Human Consciousness – fMRI and Connectome
C7: Virtual Nurse -- Learning Medical Knowledge
C8: Virtual Doctor – Advanced Learning Medical Knowledge
C9: Virtual Doctor – Conversations
C10: Microbe and Disease Knowledge Graph
C11: Knowledge Graphs for Gene Interaction and Disease Relationships
C12: Generating Gene or Immuno Therapy
C13: Molecular Drug Synthesis via Deep Learning
C14: Protein Interaction Predictor
C15: AI Exploration and Understanding of Aging

59 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Life is composed of graph of atoms

60 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Central Dogma of Biology

61 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
The Emergence of Digital Biology

62 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
AI Tools power Digital Biology

63 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C1: Precision Health - Multiple Omics
• Background
• Utilizing whole genome information can provide valuable information to
patients
• Goal
• Study open source whole genome data and explore their impact on
disease prediction.

64 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C2. Large-Scale System for Human Genome Analysis

• Background
• Size of whole genome data is
very large
• Goal
• Study is needed on big data
systems for genomic analysis
and other comics analysis

65 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C3. Mutations and Function Prediction
• Background
• We have been
monitoring COVID-19
worldwide mutations
since Feb 2020.
• More than 12,000,000
virus strains have been
sequenced
• Continuous monitoring of
large-scale data become
more and more
challenging.
• Goal
• Keep exploring key
algorithms for virus
mutation classifications.
• Use Protein function
prediction tools to
estimate the mutated
virus impact.
66 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C4. Druggable Targets for Precision Medicine
• Background
• Next-generation
medicine will be based
on personal genome
data, proteome data,
and pathway prediction.
• It’s a continuous
challenging problem to
explore the appropriate
drugs for diseases
• Goal
• Utilize Knowledge graph
of disease and drugs.
• Use the pathway
analysis of patients to
identify key variants
• Analyze the potential
drug targets

67 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Building First Human Consciousness Monitoring and Prediction Open
Platform

68 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C5. AI for Human Consciousness -- EEG and AIoT
• Background
• Human brain activities can be observed from sensing data
• Goal
• Monitoring and Predicting Human Consciousness based on sensors,
such as EEG sensors, biosensors, vital information, etc.

69 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C6. AI for Human Consciousness – fMRI and Connectome
• Background
• Human brain activities can be clearly observed from imaging data
• Goal
• Monitoring and Predicting Human Consciousness based on medical
images, such as CT, fMRI, Connectome, etc.

f-MRI Connectome
70 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C7. Virtual Nurse – Medical Knowledge Learning
• Background
• Big Data and AI technologies have
significant progress lately. It becomes
possible to learn knowledge from
diverse sources.
• Goal
• Establish AI system that can potentially
pass the New York state nurse exam.

71 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C8. Virtual Doctor – Advanced Medical Knowledge Learning
• Background
• Big Data and Deep Learning technologies have been significantly progressed
lately. It can probably pass the Doctor Qualification Exams
• Goal
• Exploring Large Language Models and open Medical and Health datasets to
learn medical knowledge

72 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C9. Virtual Doctor – Conversations
• Background
• With deep medical knowledge, it is becoming possible for building virtual
doctors who can interact with patients
• Goal
• Prototyping Virtual Doctors who can communicate with patients; observing
from multi-modality information and QA from patients.

73 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C10. Microbe and Disease Knowledge Graph

• Background
• Microbes are tiny living things that
are found all around us and are too
small to be seen by the naked eye.
They live in water, soil, air, and in
human body, which is also called
microorganisms. The most common
types are bacteria, viruses and
fungi.
• Researches indicate that microbes
and human health have strong
correlations.
• Goal
• Find the similarity of microbes and
similarity of diseases.
• Build the correlation network of
microbes and diseases to help
diagnose potential health
conditions.
74 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C11. Knowledge Graph of Gene Interaction and Disease Similarity

• Background
• Understanding the genetic networks
and their associations in diseases is
one of the important objectives of
biological researchers. The
knowledge graph serves as a
powerful tool to investigate this
topic.
• Goal
• Construct and visualize knowledge
graphs demonstrating associations
among genes based on disease
similarity.

75 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C12. Generating Gene or Immuno Therapy
• Background
• CRISPR, allowing to precisely
edit the genome of cells by
inducing double-stranded DNA
(dsDNA) breaks at specific loci,
is both an efficient and cost-
effective technological tool.
• But how to design perfect sgRNA
for detect cell DNA and without
any off-target is challenging
• We need use outsourcing data to
make a deep learning algorithm
to solve this problem.
• Goal
• Use TDC open source data to
predict what kind of sgRNA have
ability to edit or repair cell DNA.
And design an auto encoder or
GAN to generate template
sequence.
76 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C13. Molecular Drug Synthesis via Deep Learning

• Background
• Generate small molecular by
deep learning will not be hard. In
fact, these molecular are hard to
be manufacture or synthesis. We
need to design a algorithm to
simulate chemical reaction and
predict molecular synthesis
feasibility.

• Goal
• Use open source data to predict
the molecular can be produce or
not, and try to simulate the
synthesis processes by
molecular properties.

77 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C14. Protein Interaction Prediction

• Background
• Protein-protein interactions
(PPIs) are useful for
understanding signaling
cascades, predicting protein
function, associating proteins
with disease and fathoming drug
mechanism of action.
• Currently, only ~10% of human
PPIs may be known, and about
one-third of human proteins have
no known interactions.

78 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
C15. AI Exploration and Understanding of Aging

• Background
• Aging is a major impact on
human. Recent studies have
been giving more and more
information on how agin
functions and whether it’s
possible to try to delay or even
reverse some functions

• Goal
• Study the mechanisms causing
aging in the gene and protein
level. recognition
• Use the protein structure perception
comprehension sensors
prediction, protein-protein strategy representation
interaction, protein-drug biding
tools to explore memory

79 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Task Area 4: Green Earth and Advanced Topics

80 E6895 Advanced Big Data and AI – Lecture 1: Overview © 2024 CY Lin, Columbia University
Area 4 ‘Green Earth’ Tasks List:

D1: Distributed Solar Power Load Forecasting and Predictive Maintenance


D2: Distributed Wind Power Load Forecasting and Predictive Maintenance
D3: Power Flow Optimization
D4: Smart Grid Pricing Strategy
D5: AI for Novel Nuclear Fusion Power
D6: Stimulating Crop Growth
D7: Electronic Car Sensing and Predictive Maintenance
D8: Autonomous Driving
D9: Smart Cabin of Electrical Vehicles
D10: Social Policy Monitoring
D11: International Relationships and Policy Monitoring
D12: AI Chips – AI System on Chip
D13: AI Chips – Neural Processing Units
D14: Exploration in Immersive Environment
D15: Computer Vision Enhanced Immersive Environment

81 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
AI + Big Data Makes Smart Grid Possible

82 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
➢ The variability and intermittency of
renewable generation.
➢ Decreased frequency response capability
and decreasing system inertia.
➢ Changing load patterns and
unpredictability.
➢ The need to manage vastly increasing
number of endpoints.
➢ Growing cyber attack risks to the electric
grid.
Department of Energy. Smart Grid System Report, Nov 2018 83
➢ Power system modeling
➢ Power system classifying
➢ Point of event occurrence into one unified frame
➢ Equipment sensitivity to the event disturbance
recognition
➢ Power quality events detection and perception
comprehension sensors
characterizing strategy representation

➢ E.g. sag, swell, outage, harmonic, notch, memory

flicker, impulse, etc.

Fourier and Wavelet- Detection and


Fuzzy Neural Network
transform Based Classification of a
Voltage Waveform Decision Making
Feature Extraction Disturbance

Detection and Equipment Sensitivity Event Location


Characterization
Classification Study Prediction
84
Historical power
➢ Apply machine learning to historical power system system data
data to reduce operating costs and failure risk
Feature extraction
➢ Avoid or minimize the downtimes and reduce
associated costs Machine learning
algorithms
➢ Optimize the periodic maintenance operations.
➢ Health indicator by machine learning Failure Prediction

➢ Classification – health indicator predicts what is


Health indicator
the probability of failure in the future.
➢ Regression approach – health indicator predicts
recognition
perception
how much time is left before the next failure.
comprehension sensors
strategy representation

memory
85
D1. Distributed Solar Power Load Forecasting and Maintenance
• Background
• Situation of Solar
Power Plants varies
and are time
dependent. Power
companies need
good prediction on
conditions and
exclude anomalies
in short time.

• Task Goal
• Predict Solar Power
Generations based
on weather data
• Anomaly Detection
of Solar Power
Plants
• Predictive
Maintenance of
solar power plants.

86 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D2. Distributed Wind Power Load Forecasting and Maintenance

• Background
• Situation of Wind Power
Plants varies and are time
dependent. Power companies
need good prediction on
conditions and exclude
anomalies in short time.

• Task Goal
• Predict WindPower
Generations based on
weather data
• Anomaly Detection of Wind
Power Plants
• Predictive Maintenance of
solar power plants.

87 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D3. Power Flow Optimization

• Background
• Transmission is key to a Low-
Cost Decarbonized US Grid

• Task Goal
• Study the optimal straggles
for power flow
• Simulate various scenarios

https://www.greentechmedia.com/articles/read/study-transmission-is-the-key-to-a-low-cost-decarbonized-u.s-grid

88 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D4. Smart Grid Pricing Strategy

• Background
• Pricing strategy can be a way
to optimize consumer
behavior
• After more and more cars
and IoT devices rely on
power, it’s critical to influence
customer behavior to
optimize use of power grid

• Task Goal
• Implement methodologies
that can help change
customer behavior
• Game theory is a possible
solution.
• Other solutions should be
also considered.
Chen et al. A cheat-proof game theoretic demand
response scheme for smart grids, IEEE ICC 2012

89 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D5. AI for Novel Nuclear Fusion Power
• Background
• Desktop-size Nuclear Fusion is becoming a reality

• Task Goal
• Studying and applying AI technology to advance novel desktop-size Nuclear Fusion
Power.
https://alpharing.com

90 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D6. Stimulating Crop Growth

• Background
• Machine learning in image
recognition crop growth status
and crop management
strategy

• Task Goal
• Establish the ideal growth
model of crops
• By using the image recognition
find the crop growth status.
Mark the unwanted growth.
Give suggestion of the location
to do the pruning.
• Using climate and soil data to
give suggestion for irrigation
and fertilization

91 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D7. Electronic Car Predictive Maintenance

• Background
• Car Fixing and Predictive
Maintenance are important issues in
the automobile industry
• Pure electronic car is relative new

• Task Goal
• Model Knowledge Graphs of the
functioning of subsystems in an
electronic car
• Studying the sensors available in
novel cars
• Detection Car Problems from sensors
• Prediction of maintenance
requirements based on sensor signals
• Incorporating other information such
as environmental and demographical
patterns into consideration.

92 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D8. Autonomous Driving

• Background
• Autonomous Driving is becoming
mature
• Autonomous Driving has to
consider the complex situations
in the road.
• Task Goal
• Explore and experiment on
autonomous driving technologies
• Utilizing sensors to come up with
optimal strategies to drive the
car.
• Build a Game Theory and
Bayesian Network model to
consider the complex behaviors
on the road.

93 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
Example: Mobile Cognition in complex scenario

94 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D9. Smart Cabin of Electrical Vehicles
• Background
• Cars are being connected with all kinds of systems in a city
• Novel applications in car based on digital human platform.

• Task Goal
• Exploring novel car driving experience via Digital Human

95 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D10. Social Policy Monitoring

• Background
• Social Issue and Policies have been
impacting people’s life

• Task Goal
• Information Mining from Social Media to
analyze the impact of social policy.
• Analyze the effectiveness of policy
making

96 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D11. International Relations and Policy Monitoring
• Background • Task Goal
• Relationships between countries have • Large-Scale Data Mining of international
been a major issue toward world policy relationship evolutions
changes • Visualize and create early alert of relationship
changes

97 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
98 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D12. AI Chips – AI System on Chip

• Background
• Hardware AI Chips design
is getting more and more
popular

• Task Goal
• Explore the functions and
roadmaps of Edge AI Chips

99 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D13. AI Chips -- Neural Processing Units

• Background
• Hardware AI Chips design is
getting more and more
popular

• Task Goal
• Explore the functions and
roadmaps of AI Chips based
on Neural Processing Units

100 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D14. Exploration in Immersive Environment
• Background
• Augment Reality is now becoming more popular
and more and more devices have been available in
the market.
• So far, less research and few systems are available
for exploring networks in such condition.

• Task Goal
• A team will design and implement augment reality
applications based on Google Lens or Graphen
Space.

101 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University
D15. Computer Vision Enhanced Immersive Environment

• Background
• Augment Reality is now becoming more
popular and more and more devices have
been available in the market.
• Computer Vision techniques, such as
objection recognition, can further
enhance the intelligence and improve the
capability of what can be achieved.
• Project Goal
• A team will design and implement an
augment reality application based on
HoloLens or GoogleLens.
• Some computer vision techniques will be
implemented, such as object recognition
and OCR.
• The team encouraged to bring out any
interesting usage scenarios on how these
techniques can seamlessly enhance user
experience of HoloLens or Google Lens.

102 E6895 Advanced Big Data and AI — Lecture 1 © CY Lin 2024, Columbia University

You might also like