Data Analyst Cheat Sheet
Data Analyst Cheat Sheet
CONTENT:
1.MySQL - Page 2
2.Python - Page 6
3.Excel - Page 9
4.Tableau - Page 12
5.Microsoft Power BI - Page 15
6.IBM Cognos - Page 18
7.Microsoft Visio - Page 21
8.Google Looker Studio - Page 24
MySQL
1. Basics
Connect to a Database
USE database_name;
Explanation: Switches to the specified database for running queries.
Describe a Table
DESCRIBE table_name;
Explanation: Shows the structure of the table (columns, data types, keys).
2. Retrieving Data
Select All Columns
SELECT * FROM table_name;
Explanation: Retrieves all rows and columns from the table.
Limit Results
SELECT * FROM table_name LIMIT 10;
Explanation: Limits the number of rows returned.
Comparison Operators
Operator Description Example
3. Aggregating Data
Count Rows
SELECT COUNT(*) FROM table_name;
Explanation: Returns the total number of rows.
Sum Values
SELECT SUM(column_name) FROM table_name;
Explanation: Calculates the sum of numeric values.
Average
SELECT AVG(column_name) FROM table_name;
Explanation: Computes the average of a numeric column.
Group By
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name;
Explanation: Groups rows sharing a common value and performs aggregate functions.
5. Joins
Inner Join
SELECT a.column1, b.column2
FROM table1 a
INNER JOIN table2 b
ON a.id = b.id;
Explanation: Returns rows with matching values in both tables.
Left Join
SELECT a.column1, b.column2
FROM table1 a
LEFT JOIN table2 b
ON a.id = b.id;
Explanation: Returns all rows from the left table and matched rows from the right table.
Right Join
SELECT a.column1, b.column2
FROM table1 a
RIGHT JOIN table2 b
ON a.id = b.id;
Explanation: Returns all rows from the right table and matched rows from the left table.
7. Advanced Queries
Subqueries
SELECT column_name
FROM table_name
WHERE column_name IN (
SELECT column_name
FROM another_table
WHERE condition
);
Explanation: Nested query that runs within another query.
8. Data Manipulation
Insert Data
INSERT INTO table_name (column1, column2)
VALUES ('value1', 'value2');
Explanation: Adds a new row to the table.
Update Data
UPDATE table_name
SET column_name = 'value'
WHERE condition;
Explanation: Modifies existing rows.
Delete Data
DELETE FROM table_name
WHERE condition;
Explanation: Deletes rows based on a condition.
9. Indexing
Create an Index
CREATE INDEX index_name ON table_name(column_name);
Explanation: Speeds up queries by creating an index on a column.
10. Best Practices
Use Aliases: Simplify table/column references.
SELECT t1.column_name AS alias_name FROM table_name t1;
1.
2. Optimize Joins: Ensure indexed columns are used for join conditions.
3. Limit Large Queries: Use LIMIT for large datasets to avoid slow queries.
4. **Avoid SELECT ***: Only fetch required columns for efficiency.
5. Backup Data: Always back up before running DELETE or UPDATE.
Visualization:
● Bar Chart: x-axis = product_category, y-axis = total_sales.
Visualization:
● Line Chart: x-axis = date, y-axis = daily_sales.
1. Python Basics
Data Types and Operations
Data Types: int, float, str, list, dict, tuple, set, bool
x = 10 # int
y = 3.14 # float
name = "Ammar" # str
items = [1, 2, 3] # list
data = {'key': 'value'} # dict
● List Comprehension:
squared = [x**2 for x in range(10)]
● Dictionary Comprehension:
squares = {x: x**2 for x in range(10)}
2. File Handling
Read File:
with open('file.txt', 'r') as f:
content = f.read()
Write File:
with open('file.txt', 'w') as f:
f.write("Hello World")
3. NumPy
Array Creation
Create Arrays:
import numpy as np
arr = np.array([1, 2, 3])
zeros = np.zeros((3, 3)) # 3x3 array of zeros
ones = np.ones((2, 2)) # 2x2 array of ones
random = np.random.rand(3, 3) # Random numbers
Array Operations
Element-wise Operations:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4,
5. NumPy Essentials
1. Array Creation
array = np.array([1, 2, 3, 4])
zeros = np.zeros((3, 3))
ones = np.ones((3, 3))
random_array = np.random.rand(3, 3)
● np.array: Create arrays.
● np.zeros, np.ones: Initialize arrays with zeros or ones.
● np.random.rand: Random array with values in [0, 1].
2. Array Operations
arr = np.array([1, 2, 3, 4])
arr_mean = arr.mean()
arr_sum = arr.sum()
arr_max = arr.max()
● .mean(): Calculate mean.
● .sum(): Sum of elements.
● .max(): Maximum value.
3. Element-wise Operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2 # Element-wise addition
6. Pandas Essentials
1. Data Creation
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
● pd.DataFrame: Create a DataFrame.
2. Data Inspection
df.head() # First 5 rows
df.tail() # Last 5 rows
df.info() # Summary
df.describe() # Statistical summary
3. Data Selection
df['Name'] # Select column
df[['Name', 'Age']] # Select multiple columns
df.iloc[0] # Select row by index
df.loc[df['Age'] > 25] # Filter rows
4. Data Cleaning
df.dropna() # Remove missing values
df.fillna(0) # Replace NaNs with 0
df.rename(columns={'Age': 'Years'}, inplace=True) # Rename columns
5. Aggregation
df.groupby('Name').mean() # Group and calculate mean
6. Visualization with Pandas
df['Age'].plot(kind='bar')
plt.show()
7. Matplotlib Essentials
1. Basic Plot
x = [1, 2, 3]
y = [4, 5, 6]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Basic Line Plot')
plt.show()
2. Scatter Plot
plt.scatter(x, y, color='red')
plt.show()
3. Histogram
plt.hist([1, 2, 2, 3, 3, 3, 4])
plt.show()
8. Plotly Essentials
1. Interactive Line Plot
import plotly.express as px
data = {'x': [1, 2, 3], 'y': [4, 5, 6]}
fig = px.line(data, x='x', y='y', title='Interactive Line Plot')
fig.show()
3. Scatter Plot
fig = px.scatter(data, x='x', y='y', title='Scatter Plot')
fig.show()
9. Additional Tips
1. Data Wrangling:
○ Use df.apply() for custom functions.
○ Use pd.pivot_table() for multi-dimensional summaries.
2. Performance:
Use .values or .to_numpy() to convert to NumPy arrays for faster computation.
○ Optimize with .iterrows() sparingly; vectorized operations are better.
3. Visualization:
○ For larger datasets, prefer Plotly over Matplotlib for interactivity.
Excel
🔍 1. Logical Functions
IFS
● Use: Replaces complex nested IFs for cleaner logic.
● Syntax: =IFS(condition1, result1, condition2, result2, ..., TRUE, default)
Example:
=IFS(A2>90,"Overdue", A2=90,"Due", A2<90,"Not Due")
Example:
=SUMIFS(D2:D100, A2:A100, "North", B2:B100, "John")
🔁 3. Lookup Functions
XLOOKUP (Modern) vs. VLOOKUP (Legacy)
● XLOOKUP Syntax:
=XLOOKUP(lookup_value, lookup_array, return_array, [if_not_found], [match_mode], [search_mode])
● VLOOKUP Syntax:
=VLOOKUP(lookup_value, table_array, col_index, FALSE)
● Advantages of XLOOKUP:
○ Looks left or right (VLOOKUP only looks right).
○ No need for column index numbers.
○ Built-in error handling.
Example:
=XLOOKUP("Gloves", A2:A100, B2:B100, "Not Found")
🧠 4. Error Handling
IFERROR / IFNA
● Use: Handle errors gracefully.
● IFERROR Syntax: =IFERROR(value, value_if_error)
● IFNA Syntax: =IFNA(value, value_if_na)
📅 5. Date Functions
EOMONTH
● Use: Get last day of the month from a date.
● Syntax: =EOMONTH(start_date, months)
● Example: =EOMONTH(TODAY(), 1) → Last day of next month
EDATE
● Use: Shift a date forward/backward by months.
● Syntax: =EDATE(start_date, months)
● Example: =EDATE(A2, -1) → Previous month
NETWORKDAYS.INTL
● Use: Calculate workdays between two dates, including custom weekends.
● Syntax: =NETWORKDAYS.INTL(start_date, end_date, [weekend], [holidays])
● Weekend Codes: "0000011" means Mon–Fri are workdays.
Example:
=NETWORKDAYS.INTL(A2, B2, 1, {"2024-12-25","2024-12-26"})
📑 6. Pivot Tools
GETPIVOTDATA
● Use: Dynamically extract values from PivotTables.
● Syntax: Excel auto-generates this.
=GETPIVOTDATA("Total", $A$3, "Region", "USA")
● Benefit: Stays accurate even if PivotTable layout changes
Example:
=FILTER(A2:C100, C2:C100="Sales", "No records")
UNIQUE
● Use: Return unique or distinct values.
● Syntax: =UNIQUE(array, [by_col], [exactly_once])
Example:
=UNIQUE(A2:A100) // distinct values
=UNIQUE(A2:A100,,TRUE) // only values that appear once
SORT
● Use: Sort a dataset.
● Syntax: =SORT(array, [sort_index], [sort_order], [by_col])
Example:
=SORT(A2:C100, 2, 1) // sort by 2nd column, ascending
SEQUENCE
● Use: Generate a series of numbers or dates.
● Syntax: =SEQUENCE(rows, [columns], [start], [step])
Example:
=SEQUENCE(5,1,1,2) → 1, 3, 5, 7, 9
How it Works:
● Data → From Table/Range → Power Query Editor opens.
● Each step you take is recorded as a transformation.
● Press “Close & Load” to save back to Excel.
🆚 Feature Comparisons
Feature VLOOKUP XLOOKUP INDEX + MATCH
Look Left? ❌ ✅ ✅
Easier to Read ✅ ✅ ❌
Flexible ❌ ✅ ✅
Error Handling ❌ ✅ (if_not_found) ✅ (with IFERROR)
1. Navigation & Shortcuts
● CTRL + Arrow Keys: Jump to the edge of a range of data.
● CTRL + SHIFT + Arrow Keys: Select a range of cells.
● CTRL + SPACE: Select an entire column.
● SHIFT + SPACE: Select an entire row.
● ALT + =: Auto-sum selected cells.
Tip: Use shortcuts to speed up your workflow and navigate large datasets efficiently.
2. Data Cleaning
Remove Duplicates
● Path: Data → Remove Duplicates.
● Use Case: Identify and remove duplicate rows based on one or more columns.
● Example:
Original Data:
Name Age City
Alice 30 Toronto
After Removing Duplicates:
Name Age City
Alice 30 Toronto
TRIM Function
● Syntax: =TRIM(A1)
● Use Case: Removes unnecessary spaces from text.
● Example:
○ Input: " Hello World "
○ Output: Hello World
Text-to-Columns
● Path: Data → Text to Columns.
● Use Case: Split text in one column into multiple columns based on a delimiter (e.g., commas, spaces).
Logical Functions
● IF Function:
○ Syntax: =IF(condition, value_if_true, value_if_false)
○ Example: =IF(A1>50, "Pass", "Fail").
○ Input: A1 = 60
○ Output: Pass.
● AND Function:
○ Syntax: =AND(condition1, condition2, ...)
○ Example: =AND(A1>50, B1<100).
● OR Function:
○ Syntax: =OR(condition1, condition2, ...)
○ Example: =OR(A1>50, B1<100).
Lookup Functions
● VLOOKUP: Searches for a value in the first column of a range and returns a value in the same row.
○ Syntax: =VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup]).
Example:
ID Name Salary
4. Data Visualization
Creating Charts
● Path: Insert → Select Chart Type (e.g., Line, Bar, Pie).
● Tips:
○ Use bar charts for categorical data.
○ Use line charts for trends over time.
○ Use scatter plots for relationships between two variables.
Pivot Tables
● Path: Insert → PivotTable.
● Use Case: Summarize large datasets.
● Steps:
1. Select your data.
2. Drag fields into Rows, Columns, Values, and Filters.
3. Use slicers to interactively filter data.
Conditional Formatting
● Path: Home → Conditional Formatting.
● Examples:
○ Highlight values >50: Highlight Cell Rules → Greater Than.
○ Color scales for numeric ranges: Color Scales.
Solver
● Path: Data → Solver.
● Use Case: Solve optimization problems.
Descriptive Statistics
● Path: Data → Data Analysis → Descriptive Statistics.
● Use Case: Generate mean, median, variance, etc., for a dataset.
6. Power Query
● Path: Data → Get & Transform Data → From Table/Range.
● Use Case: Clean and transform data without manual effort.
● Example:
○ Combine multiple files into one dataset.
○ Remove duplicates or filter rows automatically.
7. Advanced Techniques
Dynamic Named Ranges
● Use Case: Create a range that adjusts automatically as data changes.
● Steps:
1. Define a name (Formulas → Name Manager).
2. Use the formula: =OFFSET(Sheet1!$A$1, 0, 0, COUNTA(Sheet1!$A:$A), 1).
Array Formulas
● Use Case: Perform calculations across multiple cells.
● Example:
○ {=SUM(A1:A10*B1:B10)} — Calculates the weighted sum.
Data Validation
● Path: Data → Data Validation.
● Use Case: Restrict user input.
● Example: Create a dropdown list.
8. Useful Tips
1. Save as Table:
○ Path: Insert → Table.
○ Advantage: Automatically updates formulas, charts, and pivot tables.
2. Freeze Panes:
○ Path: View → Freeze Panes.
○ Locks rows or columns for easier navigation.
3. Use Named Ranges:
○ Define ranges for readability and easier reference.
9. Keyboard Shortcuts
Shortcut Action
Tableau
Parameters:
● Dynamic values for interactive dashboards.
○ Create a parameter → Add it to a calculated field.
○ Example: Parameter to toggle between showing Sales or Profit.
6. Dashboard Tips
● Combine Views: Drag multiple sheets into a single dashboard.
● Add Interactivity:
○ Use Actions for filters, highlights, or URL navigation.
○ Example: Clicking a region filters charts to show region-specific data.
● Best Practices:
○ Keep dashboards clean and minimal.
○ Use consistent color schemes and fonts.
7. Advanced Features
Level of Detail (LOD) Expressions:
● Syntax: { FIXED [Dimension] : SUM([Measure]) }
○ FIXED: Calculate at a specific granularity.
○ INCLUDE: Include additional dimensions in aggregation.
○ EXCLUDE: Ignore dimensions.
● Example:
○ { FIXED [Region] : SUM([Sales]) }: Total Sales by Region.
Blending Data:
● Combine data from multiple sources using a common field.
● Example: Blend Salesforce data with Excel.
3. Visualizations
1. Types of Visuals:
Bar Chart: Compare categories.
○ Line Chart: Track trends over time.
○ Scatter Plot: Identify relationships between variables.
○ Pie Chart: Show proportions.
○ Map: Visualize geographical data.
2. Customizing Visuals:
○ Filters: Add filters directly to visuals.
○ Formatting:
■ Adjust axes, labels, and colors.
■ Example: Use Data Colors to highlight key data points.
○ Tooltips:
■ Enhance insights by showing additional information when hovering over data points.
3. Visual Hierarchies:
○ Drag fields to create drill-down capabilities.
○ Example:
■ Year > Quarter > Month drill-down in a Line Chart.
4. Relationships
1. Model Relationships:
○ Found in the Model view.
○ Types:
■ One-to-Many (most common).
■ Many-to-Many.
○ Example:
■ Link Sales table to Products table via ProductID.
2. Edit Relationships:
○ Double-click the relationship line > Set Cardinality and Cross-filter Direction.
6. Performance Optimization
1. Data Reduction:
○ Remove unnecessary columns and rows.
○ Use summarized data for analysis.
2. Efficient DAX:
○ Avoid using CALCULATE within calculated columns.
○ Use measures wherever possible.
3. Pre-Aggregated Tables:
○ Pre-aggregate data before importing to Power BI.
8. Advanced Analytics
1. Bookmarks:
○ Capture a view of the report for storytelling.
○ View Tab > Bookmarks.
2. Drillthrough Pages:
○ Create dedicated pages for detailed insights.
○ Example:
■ Drillthrough to view sales by individual customer.
3. What-If Parameters:
○ Add interactive sliders for scenario analysis.
9. Keyboard Shortcuts
1. Common Shortcuts:
○ Open File: Ctrl + O
○ Save: Ctrl + S
○ Add Visualization: Drag fields to canvas.
○ Undo: Ctrl + Z
○ Redo: Ctrl + Y
4. Report Building
● Creating a New Report:
○ Use the "+" button to start a new report.
○ Select a layout: Tabular, Chart, Crosstab, or Custom.
● Common Report Elements:
○ Text Items: Add headers or explanatory text.
○ Lists: Tabular view for data.
○ Crosstabs: Data comparison with rows and columns.
○ Charts: Visualize trends and comparisons.
● Report Filters:
○ Static Filters: Predefined conditions.
○ Dynamic Filters: Allow user interaction.
1. Example: Filter sales by year using a dropdown.
● Conditional Formatting:
○ Highlight cells based on conditions.
○ Example: Highlight sales > $10,000 in green:
1. Select column.
2. Apply "Conditional Style."
3. Define rule: Sales > 10,000.
5. Data Visualization Tips
● Types of Visualizations:
○ Line Charts: For trends over time.
○ Bar Charts: For comparing categories.
○ Pie Charts: For proportions.
○ Heatmaps: For density or magnitude.
● Customization Options:
○ Change colors, add data labels, and adjust axes for clarity.
○ Example: In a bar chart, right-click the axis to rename it for better understanding.
6. Advanced Analytics
● Calculations:
○ Basic Calculations:
■ SUM(column_name)
■ AVERAGE(column_name)
■ Example: TOTAL(Sales) / COUNT(Region) for average sales by region.
○ Custom Expressions:
■ CASE WHEN condition THEN value ELSE value END
■ Example: CASE WHEN Sales > 5000 THEN 'High' ELSE 'Low' END.
● Drill-Through Reports:
○ Create links to detailed reports.
○ Use "Drill-Through Definitions" to pass context (e.g., Region -> Region Details).
● Forecasting:
○ Enable "Predictive Analytics" to forecast future trends based on historical data.
9. Optimization Tips
● Use data caching to speed up queries.
● Reduce the number of joins for faster performance.
● Aggregate data at the source before importing.
10. Troubleshooting
● Common Errors:
○ Data mismatch: Ensure all columns used in joins have compatible data types.
○ Report rendering issues: Check dataset size and reduce unnecessary calculations.
● Debugging Tools:
○ Use the Validation feature to check queries and reports.
Refresh Data F5
5. Advanced Tips
Data Graphics
● Overlay dynamic data onto shapes for enhanced visualization.
○ Steps:
1. Import data.
2. Go to Data > Display Data Graphics.
3. Customize display options (text, icons, or bars).
Layer Management
● Organize and control visibility of diagram elements.
○ Steps:
1. Navigate to Home > Layers > Layer Properties.
2. Create new layers and assign shapes.
Validation
● Ensure diagram accuracy by checking rules.
○ Go to Process > Check Diagram to validate against predefined rules (e.g., for flowcharts or BPMN
diagrams).
6. Visualization Examples
Flowchart Example
[Start] --> [Process A] --> [Decision?]
| |
V V
[Yes] [No]
● Use Decision Diamonds to branch flows.
ER Diagram Example
Entity Attributes
8. Troubleshooting Tips
● Slow Performance:
○ Reduce file size by simplifying diagrams.
○ Turn off shape shadows and 3D effects.
● Connector Issues:
○ Ensure shapes are properly grouped.
○ Use Ctrl + Shift + O to view connection points.
Google Looker Studio
1. Data Connections
● Connect to Data Sources:
Looker Studio supports over 800 data connectors.
○ Popular options: Google Sheets, BigQuery, Google Analytics, MySQL, PostgreSQL.
○ Use Extract Data Connector to cache frequently used data for faster dashboards.
● Tips:
○ Ensure datasets have proper field naming conventions.
○ Pre-clean your data to avoid unnecessary calculations.
2. Data Transformation
● Calculated Fields:
○ Create new fields using Looker Studio's formulas.
○ Example:
■ SUM(Sales): Total sales.
■ CASE WHEN (Condition) THEN (Result):
CASE
WHEN Age > 18 THEN "Adult"
ELSE "Minor"
END
● Blending Data:
○ Combine datasets on common keys (e.g., blending sales and customer demographics using CustomerID).
○ Ensure the join key is consistent across datasets.
3. Visualization Options
● Charts:
○ Time Series Chart: Trends over time.
○ Bar/Column Charts: Compare categories.
○ Pie Charts: Show proportions.
○ Geo Maps: Visualize data by location.
● Pro Tip: Avoid pie charts for more than 5 categories; use bar or stacked bar charts instead.
● Interactive Elements:
○ Use filters and controls (e.g., dropdowns, date range pickers) for dynamic dashboards.
○ Set default filters for the most common views.
4. Best Practices for Dashboards
● Design:
○ Keep dashboards clean and intuitive.
○ Group related metrics together (e.g., KPIs at the top, detailed breakdowns below).
○ Use consistent colors for better readability.
● KPIs:
○ Display key metrics using Scorecards.
○ Example:
■ Total Sales: $1,200,000
■ YOY Growth: 15%
● Visualization:
○ Use color-coded scorecards to indicate performance (e.g., red for negative growth, green for positive).
CASE CASE WHEN condition THEN Creates conditional logic (similar to IF).
result END
8. Optimization Tips
● Performance:
○ Use aggregated data to reduce computation time.
○ Avoid overloading dashboards with unnecessary charts.
● Caching:
○ Use the Extract Data Connector to cache static datasets.
○ Enable Google Analytics sampling if working with large datasets.
● Version Control:
○ Save dashboard snapshots before making major changes.