Explain Simply and Clearly
Explain Simply and Clearly
explain me whct i am gonna give you now in very simple way to me , understandable points and
without missing any information,
Got it! Send me what you want me to explain, and I'll break it down into simple, clear points without
missing any details. 😊
Close your eyes for a second. No, like actually do it with me and picture data. Okay, open your eyes.
What did you think of? Was it something like this? A table of numbers and neat rows and columns?
It's the most stereotypical depiction of data. But the reality is, a lot of the data you will encounter in
the real world doesn't start out that way. It's what we call unstructured data. To get more specific,
unstructured data means data that doesn't fit neatly into rows or columns. It's all around us. When
you snap a photo, record a video, or jot down notes in a journal, you're creating unstructured data.
These are human types of information, and they feel very natural to us. In fact, if you're just a
person collecting data for yourself, you'll probably do it in an unstructured way without even
thinking about it. Maybe you just keep a journal to track your workouts with friends. It might look
something like this. Today, there were four people in our running group. We ran 3.2 miles in 29
minutes and 32 seconds, and afterwards, I focused my strength training on my core. This
information could be organized in a structured way too, like in this table where each exercise detail
is a column and each day is a row. You can see the one on the left is more human and natural, while
the one on the right is more rigid but better for analysis. So, to summarize, text data, like your
original journal entry, is considered unstructured. Here are some more examples of unstructured
data. Earlier this week, I took a photo of the sushi options at the grocery store so my wife could pick
one. That's unstructured. If it was structured, maybe it would be more like a menu with each item
and its ingredients listed neatly. I also recorded a video of several potential grill options at the
hardware store, which is unstructured data as well. If I had to put it in rows and columns, what
would I put? Maybe I could record each option in a row, including the price, warranty length, and a
score from one to five, representing how much I liked each one. To summarize, photos and videos
are unstructured. Why does this distinction between structured and unstructured data matter? It's
all about how the data gets collected and processed. At some point, all of this unstructured data
usually needs to be converted into a structured format to be analyzed effectively. Most data
analytics is happening on structured data, although modern techniques are getting better at
analyzing unstructured data directly. We'll talk more about structured data in the next video. Since
most data doesn't start out in rows and columns, it often undergoes a transformation into a
structured form. As a data analyst, you should be mindful of the origins of your data because
whether it's structured or not often affects how easy it is to pre-process and analyze. Unstructured
data typically requires more steps before it becomes useful, but it's also more natural for a person
to generate and can contain unexpected insights due to its level of detail. Let's say you're working
with customer reviews from a travel blogging site, My Aspirational Job. Reviewers can rate locations
one through five on fun, accessibility, and value, then leave a comment. The comment data is being
generated in an unstructured way. It's just free-form text. Behind the scenes, that text will need to
be processed with steps like translating to English, chunking for storage in a database, and
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 1/11
classifying by positive or negative tone. Meanwhile, the one through five ratings can be analyzed
pretty directly, say, to find an average rating for each location, but may not be as insightful as the
comments, where reviewers can offer insights like, yeah, the amount of bedbugs was
overwhelming, but the coffee was good. These reviews contain a mix of structured and unstructured
information. Even though the comments are unstructured, it's common to store them along with
the structured data. For example, you could store your review data like this, with each row
representing a review, numbers for the different ratings, and then a column that contains the
comment. This comment is still unstructured because it lacks an inherent organization. People can
write whatever they want. More on this technique in the next video. Unstructured data is a natural
byproduct of how we capture and communicate information as humans. It's a raw material that
needs to be organized before it can generate insights. In the next video, we'll discuss structured
data and why it even exists in the first place. I'll see you there.
1. What is Data?
When we think of data, we often imagine neat tables with rows and columns.
But in reality, most data starts out messy and unorganized.
2. Unstructured Data
Data that doesn’t fit into a structured table format is called unstructured data.
Examples: Photos, videos, free-form text (like journal entries or customer reviews).
It’s natural for humans to create unstructured data without thinking about it.
A structured version of this would be a table where each detail (distance, time, exercise) is in
its own column.
Structured data is better for analysis, but unstructured data feels more natural.
But unstructured data contains richer details (like customer reviews explaining WHY they
liked or disliked something).
5. Real-World Example
A travel review site collects ratings (1–5) on fun, accessibility, and value. These are structured
data.
The written review (like “too many bedbugs, but great coffee”) is unstructured data.
Both types are stored together, but the unstructured part needs extra processing (like
translation, sentiment analysis, etc.).
6. Key Takeaway
Unstructured data is everywhere and feels natural.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 2/11
It needs to be transformed into a structured format for easier analysis.
In the next step, you’ll learn why structured data exists and how it’s used.
Sounds good! Send the next one, and I'll simplify it for you. 😊
Got it! Here are the notes in a clean, structured format for your notebook:
1. What is Data?
2. Unstructured Data
Common examples:
Unstructured Example:
Journal entry: “Today, we ran 3.2 miles in 29 minutes and did core exercises.”
Structured Example:
Table with columns:
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 3/11
Unstructured data = more detailed but harder to process.
5. Real-World Example
Categorization
6. Key Takeaways
These notes are short, clear, and ready for writing in your notebook! Let me know if you need
changes. 😊
Got it! Send the next text, and I’ll convert it into structured notes for your notebook. 😊
When working with computers, you'll often find that you need to impose some structure on your
data. Computers work most efficiently when data is organized in a predefined way, whereas
humans are much more adaptable when information comes in an unexpected form. One big idea I
hope you take away from this video is that structured data essentially exists for computers to store,
process, and analyze. Structured data is all about organizing information into a standardized format
that computers can easily interpret. Most commonly, that looks like organizing data neatly into rows
and columns. That organization itself contains a lot of information. Let's revisit the example of
tracking your workouts. Here's the structured version of that information again, and two more rows
so that you can compare workouts. In each column, you, or more accurately your computer, can
expect the same kind of information. The time column will always have a time. The strength focus
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 4/11
column will always have one of a few options like core, legs, back, and arms. There will never be a
neck day, although to each their own I guess. Each row or day will contain information about each
exercise, even if you didn't run at all. Miles run and people will never be negative. These examples
represent some of the information built into the organization of this data. The mission of building in
information often involves categorizing data into specific types, such as numerical or categorical.
Numbers can be either discrete or continuous, with discrete meaning a whole counting number. For
example, either you went running with one person or two. There's no such thing as a person and a
half. Continuous means you can have fractional amounts as well. For example, you can run a
fraction of a mile, like 3.2 or 1.1. You also have specialized numerical formats such as time, which
can also be represented in discrete or continuous forms. Categorical data enables you to separate
rows into distinct groups. Core, legs, and back are categories of exercises that can be used to
analyze different types of strength focus. Categorical data is most often represented as text, but
there should be a finite number of distinct groups. Free text, like review comments, has potentially
infinite values and would not make for useful categorical data without further processing.
Numerical data can also be represented numerically, with a discrete number for each group. For
example, maybe for efficiency's sake, you just write 1 for core, 2 for legs, 3 for back. Still categorical
even though the representation is with numbers. One other key distinction in structured data is
between time series and cross-sectional data. Time series data tracks changes over time, while
cross-sectional data captures a snapshot in time. The workout data you just saw is considered time
series data, since you'll be able to analyze your miletimes and strength focus over time, plus
monitor whether your friends are keeping their promise to train with you. On the other hand, check
out this Instagram bio, which has data like the number of posts and followers, plus username,
image, and a text bio. Is this cross-sectional or time series data? This is cross-sectional, since it
shows information about an account at one moment in time. You wouldn't be able to look at this
data and say anything about followers over time or how often this person changes their profile
picture. You learned earlier that it's common to see unstructured data stored inside a table or
spreadsheet. Here's another example using the same Instagram bio you just saw. You could
represent the structured data from the bio in a table with columns for name, posts, followers, and
following. Each of these columns has a consistent format. They can be processed by a computer. For
example, you can calculate the ratio of followers to following. The bio, this description here, is
unstructured, since it's unorganized text data that cannot be easily processed by a computer. To
keep all this data together, you can tack bio onto this table. It's still unstructured data and will still
need more work to process compared with the other columns, but you can store it together with the
other data just to keep things consistent. To summarize, different columns in a table can be
structured or unstructured. Now that you've seen the core components of both structured and
unstructured data, let's take a step back and think about the difference between these two types of
data from a human perspective. As humans, we're pretty good at interpreting unstructured data.
We can read a book, watch a movie, or enjoy an emotional song without much effort. But for
computers, it's a different story. Computers need data to be organized in a specific way in order to
process it effectively. Although generative AI is making significant progress on interpreting
unstructured data, in general terms, non-AI techniques are going to work best on structured data.
So to recap, structured data is all about organizing information in a way that computers can use
effectively. As a data analyst, you'll frequently derive insights from both structured and unstructured
data. In the next video, you'll explore big data. Big data isn't just a lot of data. I promise there's more
to it. Join me in the next video to learn more.
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 5/11
Computers work best with structured data, whereas humans can handle unstructured data
easily.
Structured data is organized in a predefined way, typically in rows and columns.
This organization helps computers store, process, and analyze data efficiently.
A categorical column (e.g., strength focus) will always contain specific categories like core,
legs, back, arms.
Each row represents a single data entry (e.g., a single workout session).
Special cases: Time can be both discrete (whole hours) and continuous (fractions of seconds).
Can be stored as text (e.g., "Core") or numbers (e.g., "1 for Core, 2 for Legs").
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 6/11
Unstructured data (like text in a bio) needs additional processing before analysis.
As a data analyst, you will work with both structured & unstructured data.
Upcoming discussion: How Big Data works and why it’s important.
ou may have heard the term big data thrown around, but what does that actually mean? You might
hear the term and think, well, that just means dealing with large volumes of data. That's certainly a
part of it, but there's more to it than that. Big data is defined by three key attributes known as the
three Vs, volume, variety, and velocity. Let's start with volume. This is probably the most
straightforward characteristic of big data. Datasets nowadays are often so large, they pose
significant challenges in terms of storage and computation. Think about a company like Amazon
and the sheer volume of orders they process, around 12 to 19 million of them every day. In fact,
since you started watching this video, Amazon has probably processed more than 6,000 orders.
Volume matters because storing and analyzing that data requires some serious computational
power. If you were working at Amazon and wanted to analyze transactions, even from a single day,
you wouldn't be able to do so on your laptop at home, nor would you be able to manually copy and
paste the transaction data from one location to another. Then there's variety. In the past, the data
that analysts work with tended to be structured, meaning it fit neatly into databases or
spreadsheets. But the 21st century has seen an explosion of unstructured data, like images, text,
video, and even augmented reality data from products like the Apple Vision Pro. This explosion
coincides with the rise of the internet and social media in particular. There are way more selfies now
than about 20 years ago because the first smartphone with the front-facing camera only came out
in 2010. On a platform like Facebook, for example, when a user goes to create a new post, they can
add a photo, tag people, choose a feeling, check in in a location, start a fundraiser, or even stream
live video. Each of these post types requires its own methods for pre-processing and analysis. If you
want to answer a seemingly simple question like, what does a particular user typically post about?
You would need to analyze an extremely diverse set of data. The third V is velocity. This refers to the
speed at which data is being generated. You saw the rate at which Amazon must process orders a
moment ago, but it's not just a tech thing. During a hurricane, sensors and satellites collect massive
amounts of data every second, which must be rapidly analyzed to predict the hurricane's movement
pattern. If analysts can't wrangle that data quickly, people in the storm's path could get delayed
information. The sheer velocity of data on social media in particular is staggering. Within YouTube's
first six months, the site was getting over 100,000 video views per day. Soon, so many videos were
uploaded that human moderation simply wasn't feasible, and YouTube turned to automated
techniques. In other words, the velocity of data had consequences for how YouTube moderated its
content, which has had continued ripple effects on content moderation today. So that's the original
three Vs, and you could stop there, but there's a bit of a movement to add even more Vs to the
framework. While the first three Vs you just saw are the most important factors that differentiate big
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 7/11
data from what you might call small data, let's take a look at the additional Vs. The fourth V is
veracity. This refers to the quality of the data, and it's a crucial consideration, especially as the
volume, variety, and velocity of data increase. Is the data coming from a trustworthy source? Could
it have been corrupted along the way? As the saying goes, garbage in, garbage out. If your data is
of poor quality, your insights and consequently your business decisions will also be poor. The fifth V
is value. The idea here is that data is only worth analyzing if it actually provides some benefit. At
Netflix, for example, the vast amount of engagement data we collect feeds into the
recommendation system, allowing for personalized suggestions. Without that data, everyone would
just get the same generic recommendations, like that Netflix account your whole family shares. You
know, the one where Boss Baby is recommended right next to Scream 2. While big data is prevalent
in today's world of data analytics, there are plenty of situations where relatively small data sets
produce valuable insights. It might surprise you to learn that there are only about 6,000 hospitals in
the US. That's not a lot compared to the 290,000 Tinder matches every minute. Each hospital might
only serve a few thousand patients each year. An intensive care unit might only have a couple of
dozen beds. It's perfectly feasible to analyze that data on your laptop, and the data generated in
these contexts can still be incredibly valuable for improving patient outcomes. As a data analyst,
your job is to consider the data in the context of the problem you're trying to solve. Sometimes that
will mean working with massive, complex data sets, and other times it will mean investigating a
smaller, more focused data set. Great work completing this lesson. You've seen the many types of
data you'll be working with as an analyst. In the upcoming practice lab, you'll work with both
structured and unstructured data in an e-commerce case study for a gift shop. I know you'll enjoy it.
Once you've finished the practice lab and practice assessment, join me in the next lesson to see how
data fits into the bigger picture of an organization. We'll be talking about data teams and data
ecosystems. Intriguing. I'll see you there.
This explanation gives a solid introduction to big data and its characteristics. The 3 Vs (Volume, Variety,
and Velocity) define why big data is unique, and the additional Vs (Veracity and Value) highlight its
challenges and usefulness.
Since you're on the AI/ML path, understanding big data is crucial, especially for working with AI
models, data pipelines, and real-time analytics.
2. Data processing skills matter – Learning tools like SQL, Pandas, Spark, and Hadoop will help
handle big data.
3. Quality over quantity – Even with big data, poor-quality input leads to bad AI models.
4. Big data + AI = Smart automation – Many AI applications (e.g., recommendation systems, fraud
detection) use big data.
Are you interested in diving deeper into big data engineering, or do you want to focus more on AI
applications that use big data?
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 8/11
write in simple language
Big data means working with very large, complex, and fast-growing datasets that cannot be handled
by regular computers.
3 Vs of Big Data
1. Volume – Huge amounts of data are generated daily. Example: Amazon processes 12-19 million
orders every day.
2. Variety – Data comes in different forms like text, images, videos, and sensor data. Example:
Facebook posts include photos, videos, tags, and locations.
3. Velocity – Data is created and updated very fast. Example: YouTube videos are uploaded so
quickly that human moderation is not possible, so AI is used.
Additional Vs
4. Veracity – The accuracy and quality of data matter. Example: Incorrect data leads to wrong
business decisions.
5. Value – Data should be useful and provide benefits. Example: Netflix uses big data to give
personalized movie recommendations.
Tools like SQL, Pandas, Spark, and Hadoop help in handling big data.
AI + Big Data = Better automation and decision-making.
best infinite free notemaker to store these and flexibale from pc to ipad to mobile
Here are some of the best infinite free note-taking apps that sync across PC, iPad, and mobile:
Key Features:
Infinite pages & databases
Key Features:
Offline-first, markdown-based
Final Recommendation
Printed using ChatGPT to PDF, powered by PDFCrowd HTML to PDF API. 11/11