0% found this document useful (0 votes)
735 views52 pages

Using AI For OSINT - A Comprehensive Guide

The document is a comprehensive guide on using AI for Open Source Intelligence (OSINT) through data scraping techniques from social media platforms, specifically focusing on tools like Apify and PhantomBuster. It covers methods for scraping both public and private accounts, analyzing data with ChatGPT prompts, and cross-referencing information to gain insights. The guide emphasizes the versatility of the techniques across different social media platforms and provides step-by-step instructions for effective data extraction and analysis.

Uploaded by

rafaelfaria88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
735 views52 pages

Using AI For OSINT - A Comprehensive Guide

The document is a comprehensive guide on using AI for Open Source Intelligence (OSINT) through data scraping techniques from social media platforms, specifically focusing on tools like Apify and PhantomBuster. It covers methods for scraping both public and private accounts, analyzing data with ChatGPT prompts, and cross-referencing information to gain insights. The guide emphasizes the versatility of the techniques across different social media platforms and provides step-by-step instructions for effective data extraction and analysis.

Uploaded by

rafaelfaria88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

A.I.

OSINT
A COMPREHENSIVE GUIDE
By, An0n Ali

,
Table of
Contents
01 DATA SCRAPING TOOLS

SCRAPING DATA

02 • Public Accounts (Apify)

• Private accounts (PhantomBuster)

PROMPTS & SCENARIOS


• Chapter Introduction

03 • Exploring Insights from Posts,


Comments & Tags

• Additional Prompts

• Cross-referencing data

04 CLOSING REMARKS
WHAT THIS GUIDE COVERS:

Data Scraping: Learn methods of data scraping from both


public and private social media profiles (that you follow). These
will be clear step-by-step instructions (with conceptual
explanations).

Exclusive Prompting Techniques & Scenarios: Learn how to


perform analysis on this data with various ChatGPT prompts
and unique strategies - developed and used by An0n Ali in his
own OSINT investigations.

Cross-Referencing Data: The guide also covers multiple prompts


that demonstrate how to cross-reference data across different
profiles of a target, offering a comprehensive understanding.

Cross-Platform Applicability: And yes, the techniques and


prompts in this guide are versatile and can be adapted to any
social media platform, not just Instagram.

© COPYRIGHT NOTICE

2024 An0n Ali. All rights reserved.

This PDF is protected by copyright law. Unauthorized


reproduction, distribution, or sharing of this document, in whole
or in part, is strictly prohibited. This guide is intended solely for
the personal use of the purchaser. Any unauthorized copying or
sharing of this material on other platforms, websites, or through
any other means is prohibited and may result in legal action
Data
scraping
tools
BEFORE WE BEGIN...

If you haven’t seen my YouTube video, “How to Use AI for


OSINT,” I strongly suggest watching it first.

You can view it here.

The video provides crucial context and background that will


help you better understand the concepts and techniques
covered in this guide.

1
INTRODUCING THE TOOLS...
Now that you've seen the video, let’s jump into my favorite
data scraping tools: Apify & PhantomBuster.

WHY APIFY? (FIRST PRIORITY)


-

Pros:

1. Offers a free plan with 10,000 compute units (CU),


which is sufficient for small-scale OSINT tasks.

2. Free credits reset every month, allowing


continuous use without payment.

3. Apify has a larger library of web scraping scripts


compared to PhantomBuster.

4. No session tokens are required to scrape data


from social media profiles.

Cons:

1. It doesn’t scrape data from private accounts.

1
WHY PHANTOMBUSTER?
-

Pros:

1. Allows scraping data from private accounts (that


you follow).

2. Beyond scraping, it also offers automation tools for


social media actions like following, messaging, and
interacting with posts, which can be valuable for
interactive OSINT operations.

Cons:

1. Only offers a 14-day trial with limited data


extraction; a paid plan is required afterward.
2. It requires your social media account’s session
cookies for access.

3. Scraping data using your social media account


may result in your account being banned.

Note: I recommend using PhantomBuster only if you need to


scrape data from a private account; otherwise, use Apify.

21
Scraping
data
APIFY – PUBLIC ACCOUNTS

STEP 1: CREATE AN ACCOUNT ON APIFY


Go to www.apify.com and sign up for a free account.

2
STEP 2: ACCESS THE APIFY STORE
In the left-hand panel of the Apify console, click on the
"Store" button.

This will take you to the Apify Store, where you can
browse and find various scripts for web scraping. These
are called “Actors” within Apify.

2
STEP 3: SEARCH FOR SCRAPERS
-

Our goal is to:


1. Scrape all comments from an Instagram account.
2. Filter out comments made by a specific username.

To achieve this, start by searching for "Instagram" in the


Apify store.

Next, from the list of available actors, select the one


named "Instagram Scraper.

2
Concept: The reason behind picking this specific actor is
because it's built to extract data from an Instagram
profile, such as posts, comments, followers, and more.

The other options, as indicated by their names, are


intended for other types of data extraction tasks…

2
STEP 4: CONFIGURE SCRAPING SETTINGS
Input the Profile URL: Enter the URL(s) of the Instagram
profile(s) you want to scrape.

Select Data Points: Choose the specific data points you


want to scrape, such as posts, followers, following,
comments, etc.

2
In our scenario, the appropriate option would be
“Scrape Posts” from each page.

Here’s why:

• The 'Scrape comments' option only gives us the


comments, without the links to the posts where
those comments were made.

• Since our objective is to find comments along with


the links to the posts where they appear, we'll
choose the 'Scrape posts' option instead.

2
Max Items:
-

Specify the maximum number of posts to extract, or


leave it blank to scrape all posts.

Optional:
-

You can also configure settings such as:

1. Newer than:
-

Only scrape posts newer than the date you enter. This is
useful if you’re only interested in recent data.

2. Scrape based on search query instead of URL:


-

This option isn’t applicable to our scenario, but it lets you


search for posts or profiles using specific keywords or

2
hashtags, rather than targeting a specific user profile
directly.
-

It’s particularly useful for performing OSINT on broader


topics instead of focusing on a single individual.

Run Options:

Always keep the run settings as default in basic OSINT scenarios.

2
STEP 5: RUN THE ACTOR
Click “Save and Start” to begin scraping. Monitor the
progress as data is being fetched.

Monitor the Progress: You can view the progress of the


actor as it is fetching the data.

2
STEP 6: VIEW AND DOWNLOAD THE DATA
After the scraping is complete, the actor will display a
"Succeeded" prompt, like the one shown below.

Click on “Export results” and download the data set in


CSV format.

2
STEP 7: ANALYZE THE DATA WITH CHATGPT
Finally, use the scraped data with ChatGPT for specific
analyses.

Since we want to filter out the posts where a specific


user has commented, we can prompt GPT with
something like:

Prompt: "I'm providing you with a dataset of all the


posts made by some Instagram accounts. Please
identify and list links to all the posts where [username]
has commented.”

2
Upon hitting enter, ChatGPT will list links to all the posts
where that user had commented.

2
PRIVATE ACCOUNTS - PHANTOMBUSTER

STEP 1: SIGN UP
Go to www.phantombuster.com and click on “Start free
trial” to create an account.

Use your email address to sign up.

2
Once done, open your email inbox and verify your
Phantom Buster account.

Click on “Browse Phantoms” and head to the “Phantom


Store”.

2
STEP 2: CHOOSE AN INSTAGRAM PHANTOM
From the left-hand panel under the Filters section, click
on “Instagram”.

For the same scenario as before, select the “Instagram


Profile Post Extractor”.

2
STEP 3: SET UP THE PHANTOM
Click on “Use this Phantom” to add it to your
dashboard.

Next, you’ll need to connect your Instagram account


by providing your session cookie.

You can do this either by installing the PhantomBuster


browser extension or manually pasting the session
cookies using your browser’s developer tools.

2
For simplicity, I’m going with the browser extension
option.

1. After installing, log in to your Instagram account on


your browser.

2. Refresh the session cookie page.

3. Once the session cookies appear, click on “Connect


to Instagram".

2
Enter the Profile URL: Next, input the URL of the private
Instagram account you want to scrape.

Configure Data to Scrape: Leave both of these empty if


you want to scrape all posts from all the accounts that
you input.

2
Launch Settings: Finally, select the Launch Frequency
Settings according to your needs.

Since we are looking to scrape data from one (or


multiple) profile(s) and are likely to do it once, the best
settings would be as shown in the screenshot.

Concept: However, just for your understanding, this is


what the other options do:

1. Repeatedly: This option lets you run the task


multiple times on a schedule (e.g., every hour,
every day). It’s useful if you need to monitor a
profile continuously over time.

2
2. After another Phantom: This option schedules your
task to run after another Phantom task has
completed. This is useful if you have a sequence of
tasks that need to run in a specific order.

3. Advanced: This allows for more complex


scheduling setups, such as specific times of day or
advanced recurrence rules.

Finally, Click on Save.

2
STEP 4: LAUNCH THE PHANTOM
From the next window, click on “Launch” to launch the
phantom…

…and track the progress of the scraping in the


Phantom Buster dashboard.

2
STEP 5: WE’RE NOT DONE YET!
After the phantom has finished running, you’ll need to
return to the "Phantom Store" and choose the
"Instagram Post Commenters Export" phantom.

This is because the dataset we obtained earlier only


includes links to Instagram posts from the profile, but it
doesn’t include user comments on those posts.

To gather that information, we’ll need to run the


Commenters Export phantom using the steps outlined
below:

1. Connect to Instagram: After clicking on “Use this


Phantom”, insert your Instagram cookies and click on
“Connect to Instagram”, just like before.

2
2. Posts to Process: From the “Posts to Process” window,
click on “My Phantoms” and select the dataset from
previous phantom.

3. Behavior: Keep these empty, just like before…

…and launch the phantom.

2
STEP 6: DOWNLOAD AND ANALYZE DATA
Once the Phantom has completed its run, you can
download the dataset in CSV format…

…and ask GPT to list all comments from a specific user


using the same prompt we used for the Apify’s dataset.

Prompt: I'm providing you with a dataset of comments


from various Instagram posts. Please filter and list all
the comments made by [specific username].

2
Prompts &
scenarios
WHAT WE WILL COVER

In this section, we’ll delve into the advanced applications


of AI for OSINT, demonstrated through specific scenarios.
Each scenario is designed to showcase how AI can be
leveraged to extract and analyze data from social media
profiles.

3
Few Things to Keep in Mind:
Before diving into the specific scenarios, here are a
few important points to consider:

1. Comprehensive Data Matters

o The accuracy and depth of your analysis will


improve with the amount of data available
from the target profile.

o The more data you have, the richer and more


precise your findings will be.

2. Applicability Across Platforms

o While Instagram profiles and scrapers are


used as examples throughout these scenarios,
the principles and prompts provided are
applicable across all social media platforms.

o These scenarios are only meant to illustrate


the possibilities with large datasets and AI,
offering a framework you can adapt to any
platform or target.

o Please read each prompt carefully and


customize it according to the specific
username and social media platform you are
working with.

3
EXPLORING INSIGHTS FROM POSTS,
COMMENTS & TAGS

 Locations of Frequent Activity

1. Through Posts

• Use this scraper: “Instagram Post Scraper”

• ChatGPT Prompt: "I've provided you with a dataset


of all [username]'s posts. Identify and list any posts
that include location data or mention specific
places (even in captions). Summarize the most
frequently tagged locations.”

2. Through Tags/Mentions

• Use this scraper: “Instagram Mentions Scraper”

• ChatGPT Prompt: "I've provided you with a dataset


of all the posts where [username] is tagged.
Identify and list all posts that include location data
or mention specific places (even in captions).”

3
3. Through Comments

• Use this scraper: “Instagram Post Scraper”

• ChatGPT Prompt: "I've provided you with a dataset


of all [username]'s posts. Identify and list all
comments that mention locations or geographical
information. If a comment is not found, give a link
to the post instead.”

Advanced Cross-Referencing of Location Data:


Enhance location accuracy by prompting ChatGPT to
cross-reference posts, comments, and mentions/tags.
This helps identify locations where the user frequently
posts or is tagged, and analyzes locations mentioned
often in captions or replies.

For example:

Prompt: "I’ve provided you with datasets of posts,


comments, and mentions/tags from [username]'s
profile. Cross-reference these datasets and identify
locations where [username] posts most often, is tagged
frequently, and mentions in captions or comment
replies. Summarize the locations with the highest
frequency and provide insights into [username]'s most
likely areas of activity."

3
 Map Out Relationships / Connections

1. Through Posts

• Use this scraper: “Instagram Post Scraper”

• ChatGPT Prompt: "I've provided you with a dataset


of all [username]'s posts. Identify and list posts that
mention or tag other users. Provide details on who
is tagged most frequently and the context.”

2 . Through Tags/Mentions

• Use this scraper: “Instagram Mentions Scraper”

• ChatGPT Prompt: "I've provided you with a dataset


of all the posts where [username] is tagged. List
the usernames of people who tag this person the
most frequently. Provide a count for each
username.”

3
3 . Through Comments

• Use this scraper: “Instagram Post Scraper”

• ChatGPT Prompt: "I've provided you with a dataset


of all [username]'s posts. List the usernames that
appear most frequently in the comments. Provide
a count for each username. Also display the
comments which includes a [heart] emoji
somewhere in them.”

Cross-Referencing of Relationship Data:

This allows identifying key individuals who frequently


appear across all categories, helping to pinpoint the
user's closest connections and the context of their
interactions.

Example Prompt:

Prompt: "I've provided you with datasets from


[username]'s posts, tags/mentions, and comments.
Cross-reference these datasets and identify individuals
who appear most frequently across all categories.
Highlight users who are both frequently tagged and
mentioned in comments, and summarize the context of
these interactions to identify [username]'s closest
connections."

Tip: You can also ask GPT to filter posts by university or


workplace; and then manually check those posts to
see if your target has posted a photo with someone
without specifically tagging them. This is also true if you
filter posts by diff. events (graduation, birthdays etc)

3
 Identify Behavioral Patterns

1. Through Posts

• Use this scraper: “Instagram Post Scraper”

• Prompt 1: "I've provided you with a dataset of all


[username]'s posts. Analyze the tone, language,
and content of the posts to determine their
personality traits, interests, or emotional state.”

• Prompt 2: "I've provided you with a dataset of all


[username]'s posts. Identify recurring themes,
topics, or hashtags that the user frequently posts
about, revealing their main areas of interest.”

2 . Through Tags/Mentions

• Use this scraper: “Instagram Mentions Scraper”

• Prompt 1: "I've provided you with a dataset of all


the posts where [username] is tagged. Assess who
the user interacts with most frequently and in what
context, which could reveal their social behavior
and preferences.”

3
• Prompt 2: " I've provided you with a dataset of all
the posts where [username] is tagged. Track the
types of events, places, or activities the user is
frequently tagged in, which can point to trends or
hobbies.”

3 . Through Comments

• Use this scraper: “Instagram Post Scraper”

• Prompt 1: "I've provided you with a dataset of all


[username]'s posts. Examine the types of
comments the user makes (supportive, critical,
casual) to identify their general demeanor or
attitudes towards certain people or topics.”

• Prompt 2: "I've provided you with a dataset of all


[username]'s posts. Evaluate the user's comments
on certain topics or in specific groups, helping to
identify their interests or opinions on various
matters.”

Advanced Cross-Referencing of Behavioral


Patterns

Example Prompt: "I've provided you with datasets from


[username]'s posts, tags/mentions, and comments.
Cross-reference these datasets to analyze the user's
tone, language, and interactions. Identify recurring
themes or behaviors across all data points to draw
conclusions about the user's personality traits, interests,
and social preferences."

3
ADDITIONAL PROMPTS

 For Analysis on Posts:

1. Find Posts with User Mentions: "Identify posts that


mention or tag other users. Provide details on who is
tagged most frequently and the context."

2. Recurring Locations: "Identify any posts that include


location data or mention specific places. Summarize
the most frequently tagged locations."

3. Identify Content Related to Specific Events: "Search


for posts related to specific events, holidays, or
significant dates. Provide a summary of the content
and engagement."

3
4. Sentiment Analysis of Captions: "Perform a sentiment
analysis on the captions of all posts. Provide a summary
of the overall sentiment (positive, negative, neutral)."

5. Analyze Captions for Insights: "Analyze the language


used in captions to identify common phrases, calls to
action, or personal insights."

6. Hashtag Usage Analysis: "List the most frequently


used hashtags in the posts. Highlight any trends or
patterns in their usage."

7. Identify High-Engagement Posts: "Identify the posts


with the highest engagement (likes, comments, shares).
Provide details on the content and any common
themes."

8. Identify Key Milestones or Announcements: "Identify


posts that mark significant milestones or
announcements (e.g., graduation, life events).
Summarize the impact of these posts."

3
 For Analysis on Comments

1. Locate Geographical Information: "Identify


comments that mention locations or geographical
information."

2. Identify Frequent Commenters: "List the usernames


that appear most frequently in the comments. Provide
a count for each username."

3. Find Comments Related to Specific Topics: "Find


comments that mention specific topics, events, or
names (e.g., 'vacation', 'John', 'concert')."

4. Detect Patterns in Comments: "Identify recurring


phrases or words in the comments. Highlight any that
are unusual or highly specific."

3
5. Sentiment Analysis: "Analyze the sentiment of the
comments (positive, neutral, negative). Provide a
summary of the overall sentiment on the profile."

6. Uncover Potential Connections: "Analyze the


usernames in the comments to identify any accounts
that appear to be related (e.g., similar names, mutual
followers)."

7. Analyze Comment Frequency Over Time: "Analyze


the frequency of comments over time. Identify any
periods with a spike in activity."

8. Identify Comments with Contact Information: "Search


for any comments that contain phone numbers, email
addresses, or other contact information."

9. Determine the Most Engaging Posts: "Determine


which posts received the most comments. Identify any
common themes among these posts."

10. Identify Requests or Calls to Action: "Identify


comments where followers ask the user for something
(e.g., 'Please follow back', 'Can you post about...')."

3
 For Analysis on Tags/Mentions

Use this Scraper/Actor: “Instagram Mentions Scraper”

1. Analyze the Context of Tags: "Identify common


themes or topics in the posts where this person is
tagged (e.g., events, celebrations, work)."

2. Identify Frequent Taggers: "List the usernames of


people who tag this person the most frequently.
Provide a count for each username."

3. Geographical Information: "Identify posts where this


user is tagged or mentioned that contain location data
or mention specific places in the captions."

4. Sentiment Analysis: "Perform a sentiment analysis on


the posts where this user is tagged or mentioned.
Provide a summary of the overall sentiment."

3
5. Identify Relationships: "Analyze the tags and
mentions to identify key individuals that frequently
appear with this user. Summarize the nature of their
interactions."

6. Detect Patterns in Tags/Mentions: "Identify any


recurring themes or patterns in the posts where this user
is tagged or mentioned. Highlight any trends that stand
out."

7. Analyze Comment Frequency Over Time: "Analyze


the frequency of comments over time. Identify any
periods with a spike in activity."

8. Assess Influence Based on Tags/Mentions: "Evaluate


how often this user is tagged or mentioned in posts
related to influential or trending topics. Provide insights
into their social influence."

9. Identify Key Events Through Tags/Mentions: "Identify


any key events or significant milestones where this user
is frequently tagged or mentioned. Summarize the
impact of these events."

3
CROSS-REFERENCE DATA BETWEEN
MULTIPLE PROFILES

If you’re able to get your hands on multiple social


media profiles belonging to a subject, you can also ask
ChatGPT to combine and cross-reference the data
between them to provide more accurate results.

To achieve this, you’ll need to use appropriate scrapers


within Apify or PhantomBuster according to the social
platform you’re scraping.

What You Might Discover:

These prompts will help you explore various aspects of


your subject's online presence, including consistent
behavioral patterns, overlapping locations, shared
connections, and even potential inconsistencies across
their profiles. By cross-referencing this data, you can
gain a deeper understanding of the subject's activities,
relationships, and influence across different platforms.

3
1. Track Common Connections: “Compare the list of
friends, followers, and connections from [Target's]
Instagram, Facebook, and LinkedIn profiles. Identify
individuals who appear across all three platforms and
analyze the nature of their relationship with [Target].
Highlight any key connections that might reveal close
relationships or professional ties.

Pro-Tip: You can also scrape things like comments from


multiple social media profiles and ask ChatGPT to
analyze the connections or social dynamics based on
this broader data set. This approach is similar to what
we discussed earlier, but now you're using data from
multiple profiles instead of just one, giving you a more
comprehensive view of how Target interacts with others
across different platforms.

2. Identify Overlapping Locations: Cross-reference the


location data or mentions from [Target's] Instagram
and Facebook posts. Identify locations that frequently
appear on both platforms and analyze whether these
places are significant to [Target]. Provide insights into
[Target's] most common areas of activity.

3
3. Uncover Potential Inconsistencies: "Compare the
employment history, educational background, and
other personal information listed on [Target's] LinkedIn
profile with the information shared on Facebook and
Instagram. Identify any discrepancies or inconsistencies
that may suggest misrepresentation or hidden aspects
of [Target's] life."

4. Identify Recurring Themes and Interests: "Cross-


reference hashtags, keywords, and topics from
[Target's] Instagram, Facebook, and LinkedIn profiles.
Identify recurring themes or interests that [Target]
frequently engages with across all platforms, and
summarize their main areas of focus."

5. Analyze Public vs. Private Persona: "Cross-reference


the content posted on [Target's] LinkedIn profile with
their Instagram and Facebook posts. Assess any
differences in how [Target] presents themselves
professionally versus personally, and summarize any
contrasting behaviors or interests."

3
6. Identify Consistent Behavioral Patterns: "Cross-
reference the posts, comments, and interactions from
[Target's] Instagram and Facebook profiles. Identify
consistent behavioral patterns, such as recurring
themes, topics, or tone across both platforms, to gain
insights into [Target's] personality and interests."

7. Analyze Content Frequency and Timing: "Cross-


reference the frequency and timing of posts from
[Target's] Instagram, Facebook, and LinkedIn profiles.
Identify any patterns in when [Target] is most active on
each platform and whether certain events or activities
prompt simultaneous posts across multiple platforms."

8. Identify Changes Over Time: "Compare historical


data from [Target's] Instagram, Facebook, and LinkedIn
profiles. Analyze how [Target's] interests, connections,
or behavior have evolved over time across these
platforms, and provide insights into any significant
changes in their life or career trajectory."

3
9. Track Professional and Personal Networks: "Compare
the professional connections on [Target's] LinkedIn
profile with their personal connections on Facebook
and Instagram. Identify individuals who appear in both
networks and analyze whether there are any overlaps
between [Target's] work and personal life."

10. Detect Potential Network Gaps: "Cross-reference


friends and connections across [Target's] Instagram,
Facebook, and LinkedIn profiles. Identify individuals or
networks present on one platform but absent on others,
revealing different social circles or areas of lower
activity."

11. Assess Social Influence Across Platforms: "Analyze


the level of engagement (likes, comments, shares) on
[Target's] posts across Instagram, Facebook, and
LinkedIn. Cross-reference this data to determine which
platform [Target] has the most influence on and
provide insights into their social reach and impact."

3
Closing
remarks
BEFORE WE CONCLUDE…

I'd like to emphasize that while this guide mainly


focused on scraping data from Instagram and
performing various analyses, the techniques we've
explored are just the tip of the iceberg.

The methods and tools discussed can be applied


across multiple platforms, allowing you to uncover a
vast array of information about someone using AI
and large datasets. The possibilities are endless—
you simply need to challenge your thinking and
discover new ways to make sense of the data you
collect.

One more thing to keep in mind is that you can


even ask ChatGPT for guidance on how to uncover
specific insights about someone, including what
kind of datasets you might need. This further
expands the possibilities, enabling you to explore
new frontiers in data analysis.

4
My goal with this guide was to provide you with an
introduction to how emerging Large Language
Models, like ChatGPT, can be used in conjunction
with datasets to reveal valuable insights.

“In this modern age, where information is power,


data truly is as valuable as gold! ”

Thank you for reading, I hope this guide has sparked


new ideas, and I encourage you to keep exploring
and pushing the limits of what you can achieve with
data and AI.

- An0n Ali

3
4

You might also like