0% found this document useful (0 votes)
17 views2 pages

Overview of The Data Analyst Ecosystem-En

Uploaded by

majobadano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views2 pages

Overview of The Data Analyst Ecosystem-En

Uploaded by

majobadano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

A data analyst's ecosystem includes the infrastructure, software, tools,

frameworks, and processes


used to gather, clean, analyze, mine, and visualize data.
In this video, we will go over a quick overview of the ecosystem before going into
the details
of each of these topics in subsequent videos.
Let’s first talk about data.
Based on how well-defined the structure of the data is, data can be categorized as
structured,
semi-structured, or unstructured.
Data that follows a rigid format and can be organized neatly into rows and columns
is
structured data.
This is the data that you see typically in databases and spreadsheets, for example.
Semi-structured data is a mix of data that has consistent characteristics and data
that
doesn’t conform to a rigid structure.
For example, emails.
An email has a mix of structured data, such as the name of the sender and
recipient, but
also has the contents of the email, which is unstructured data.
And then there is unstructured data:
Data that is complex, and mostly qualitative information that is impossible to
reduce to
rows and columns.
For example, photos, videos, text files, PDFs, and social media content.
The type of data drives the kind of data repositories that the data can be
collected and stored
in, and also the tools that can be used to query or process the data.
Data also comes in a wide-ranging variety of file formats being collected from a
variety
of data sources, ranging from relational and non-relational databases, to APIs, web
services,
data streams, social platforms, and sensor devices.
This brings us to data repositories:
A term that includes databases, data warehouses, data marts, data lakes, and big
data stores.
The type, format, and sources of data influence the type of data repositories that
you can
use to collect, store, clean, analyze, and mine the data for analysis.
If you’re working with big data, for example, you will need big data warehouses,
that allow
you to store and process large-volume high-velocity data and also frameworks that
allow you to
perform complex analytics in real-time on big data.
The ecosystem also includes languages that can be classified as query languages,
programming
languages, and shell and scripting languages.
From querying and manipulating data with SQL to developing data applications with
Python,
and writing shell scripts for repetitive operational tasks, these are important
components in a
data analyst’s workbench.
Automated tools, frameworks, and processes for all stages of the analytics process
are
part of the Data Analysts ecosystem.
From tools used for gathering, extracting, transforming, and loading data into data
repositories,
to tools for data wrangling, data cleaning, data mining, analysis, and data
visualization
— it's a very diverse and rich ecosystem.
Spreadsheets, Jupyter Notebooks, and IBM Cognos are just a few examples.
We will cover some of the data analytics tools in greater detail in subsequent
sections of
the course.

You might also like