0% found this document useful (0 votes)
8 views4 pages

Download

The document provides an overview of data analytics, highlighting the diverse sources of data and the role of data analysts in translating data into actionable insights. It outlines the data analytics process, types of analytics, and the skills required for data analysts, as well as the importance of data wrangling, statistical analysis, and data visualization. Additionally, it discusses various data types, repositories, and the ETL process essential for preparing data for analysis.

Uploaded by

ass
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Download

The document provides an overview of data analytics, highlighting the diverse sources of data and the role of data analysts in translating data into actionable insights. It outlines the data analytics process, types of analytics, and the skills required for data analysts, as well as the importance of data wrangling, statistical analysis, and data visualization. Additionally, it discusses various data types, repositories, and the ETL process essential for preparing data for analysis.

Uploaded by

ass
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Course Summary - Introduction to Data Analytics

Modern Data Ecosystem


Data is available in a variety of structured and unstructured datasets, residing in text, images, videos, click streams, user conversations, the Internet of Things or IoT
devices, social media platforms, real-time events that stream data, legacy databases, and data sourced from professional data providers and agencies. The sources have
never before been so diverse and dynamic.

Data analyst

translate data and numbers into plain language, so organizations can make decisions. Data analysts inspect and clean data for:

deriving insights
identify correlations
find patterns
apply statistical methods

to analyze and mined data and visualize data to interpret and present the findings of data analysis.

New technologies

like cloud computing, machine learning, and big data have a significant influence on the data ecosystem, providing access to limitless storage, powerful computing,
and advanced tools for data analysis.

Data Analytics

Data analytics is the process of gathering, cleaning, analyzing and mining data, interpreting results, and reporting the findings.

Types of Data Analytics

Descriptive Analytics "What happened?"


Diagnostic Analytics "Why did it happen?"

Predictive Analytics "What will happen next?"

Prescriptive Analytics "What should be done about it?"

The Data Analytics Process

* Understanding the problem and desired result


* Setting a clear metric
* Gathering data
* Cleaning data
* Analyzing and mining data
* Interpreting results
* Communicating the findings

Data Analysis Vs Data Analytics

Term Data Analysis Data Analytics

Definition Detailed examination of elements or structure of something Systematic computational analysis of data or statistics

Use of Can be done without numbers or data (e.g., business analysis, Almost invariably implies the use of data for numerical manipulation and
Number psychoanalysis, etc.) inference

Historical
Often based on inferences from historical data Not limited to historical data; can include predictive elements
Data

Responsibilities of Data Analyst


* Acquiring data from primary and secondary data sources
* Creating queries to extract required data from databases and other data collection systems
* Filtering, cleaning, standardizing, and reorganizing data in preparation for data analysis
* Using statistical tools to interpret data sets and to identify patterns and correlations in data
* Analyzing patterns in complex data sets and interpreting trends
* Preparing reports and charts that effectively communicate trends and patterns
* Creating appropriate documentation to define and demonstrate the steps of the data analysis process

Applications of Data Analytics

* Analyzing commercial content to identify and share information


* Monitoring health metrics, such as sugar levels for diabetes patients
* Every industry, including airlines, pharmaceuticals, banking, etc., can benefit from data analytics such as sales pipeline analysis, financial reporting, headcount planning
and review
* Companies are using data analytics to understand changes in customer buying habits during the pandemic
* Analytics helps companies pivot and cater to changing demand
* Sentiment analysis of tweets and news stories to inform investment decisions
* Satellite imagery to track industrial activities development
* Geolocation data to track store traffic and predict sales volume

Skills required for Data Analysts


Technical Skills Functional Skills Soft Skills

Proficiency in spreadsheets, statistical and visualization tools, Understanding of statistics, analytical Collaboration, effective communication,
programming, querying languages, and working with various techniques, problem-solving, data visualization, storytelling with data, stakeholder engagement,
data repositories and big data platforms. and project management. curiosity, and intuition.

Types of Data
Data is unorganized information that is processed to make it meaningful.

Structured Semi-Structured Un-Structured

Characteristics Well-defined structure, tabular format, Some organizational properties, Lacks specific structure, no mainstream database
schemas metadata-driven fit

Examples of Sources SQL databases, Spreadsheets, Online forms, E-mails, XML, Binary executables, Data Web pages, Social media feeds, Images,
Sensors, Logs integration Audio/Video, Documents

Storage and Analysis


Method Relational or SQL databases, standard XML, JSON, metadata for grouping and
Files, NoSQL databases, specialized analysis
analysis hierarchy

Different File Formats

Data professional work with a variety of data file types, and formats including delimited text files (CSVs and TSVs), Microsoft Excel XLSX, XML, PDF, and JSON.
These formats are used for storing, organizing, and sharing data in different ways, offering flexibility and compatibility with a wide range of applications and systems.

Data Sources

Relational Databases Systems like SQL Server, Oracle, MySQL, and IBM DB2 used for structured data storage.
Flat Files & XML Datasets Plain text formats with delimited values (CSV, TSV) or hierarchical structures (XML) for data organization.
APIs and Web Services Interfaces for interacting with data providers or applications, returning data in various formats.
Web Scraping Techniques for extracting specific data from web pages based on parameters, using tools like BeautifulSoup, Scrapy, and Selenium.
Data Streams Continuous flows of data from various sources (IoT devices, GPS data, web clicks, etc.) often timestamped and geo-tagged.
RSS Feeds Sources for capturing updated data from forums and news sites, streamed to user devices via a feed reader.

Data Repositories
A Data Repository is a general term that refers to data that has been collected, organized, and isolated so that it can be used for reporting, analytics, and also for archival
purposes. The different types of Data Repositories include:

Databases, which can be relational or non-relational, each follow organizational principles to show the kind of data they can store and the tools used to query, organize,
and retrieve data.

Data Lakes, that serve as storage repositories for large amounts of structured, semi-structured, and unstructured data in their native format.

Big Data Stores, that provide distributed computational and storage infrastructure to store, scale, and process very large data sets.

Data Warehouses, that consolidate incoming data into one comprehensive storehouse.

Data Marts, that are essentially sub-sections of a data warehouse, built to isolate data for a particular business function or use case.

Extract, Transform, and Load

ETL process is an automated process that converts raw data into analysis-ready data by:

* Extracting data from source locations


* Transforming raw data by cleaning, enriching, standardizing, and validating it
* Loading the processed data into a destination system or data repository

Data Pipeline, sometimes used interchangeably with ETL, encompasses the entire journey of moving data from the source to a destination data lake or application, using
the ETL process.

Data Sources, can be internal or external to the organization, and they can be primary, secondary, or third-party, depending on whether you are obtaining the data directly
from the original source, retrieving it from externally available data sources, or purchasing it from data aggregators.

Data that has been identified and gathered from the various data sources is combined using a variety of tools and methods to provide a single interface using which data
can be queried and manipulated.

The data you identify, the source of that data, and the practices you employ for gathering the data have implications for quality, security, and privacy, which need to be
considered at this stage.

Data Wrangling
Data Wrangling is an iterative process that involves data exploration, transformation, and validation.
* Structurally manipulate and combine the data using Joins and Unions.

* Normalize data, clean the database of unused and redundant data.

* Denormalize data, that is, combine data from multiple tables into a single table so that it can be queried faster.

* Clean data, which involves profiling data to uncover quality issues, visualizing data to spot outliers, and fixing issues such as missing
values, duplicate data, irrelevant data, inconsistent formats, syntax errors, and outliers.

* Enrich data, which involves considering additional data points that could add value to the existing data set and lead to a more
meaningful analysis.

Statistical Analysis
Statistical Analysis involves the use of statistical methods in order to develop an understanding of what the data represents.

Descriptive statistical analysis: provides a summary of what the data represents. Common measures include Central Tendency, Dispersion, and Skewness.

Inferential statistical analysis: involves making inferences, or generalizations, about data. Common measures include Hypothesis Testing, Confidence Intervals, and
Regression Analysis.

Data Mining

Data Mining, simply put, is the process of extracting knowledge from data. It involves the use of pattern recognition technologies, statistical analysis, and mathematical
techniques, in order to identify correlations, patterns, variations, and trends in data.

Data Visualization
Data visualization is the discipline of communicating information through the use of visual elements such as graphs, charts, and maps. The goal of visualizing data is to
make information easy to comprehend, interpret, and retain.

© IBM Corporation. All rights reserved.

You might also like