Python & Data Analysis Vocabulary List
Python Vocabulary
Variable: A name that stores a value (e.g., x = 5)
Data Type: The kind of data (string, integer, float, boolean, etc.)
String: Text data inside quotes (e.g., 'Hello')
Integer (int): Whole numbers like 5, 100
Float: Decimal numbers like 3.14
Boolean: True or False values
List: A collection of items inside [] (e.g., [1, 2, 3])
Tuple: Like a list, but cannot change (immutable), inside ()
Dictionary (dict): Data stored as key-value pairs {key: value}
Set: A collection of unique items (no duplicates), inside {}
Operator: Symbols like +, -, *, / used in calculations
Conditional Statement: Code that makes decisions using if, elif, else
Loop: Code that repeats (e.g., for loop, while loop)
Function (def): Reusable block of code, defined using def
Argument / Parameter: Values passed into a function
Return: Gives back a result from a function
Indentation: Spaces or tabs used to structure Python code
Class: A blueprint for creating objects
Object: An instance of a class
Attribute: A variable inside a class/object
Method: A function inside a class
Module: A Python file containing functions/classes
Package: A collection of Python modules
Import: Bringing in external code using import
Library: A collection of ready-made code for specific tasks (e.g., Pandas, NumPy)
Exception: An error detected during program execution
Try-Except Block: Handling errors safely
Lambda Function: A small anonymous function
List Comprehension: A quick way to create lists with a loop in one line
Recursion: A function calling itself
Decorator: A function that adds extra features to another function
Iterable: Any object you can loop over (like list, tuple)
Iterator: An object that remembers its place during iteration
Generator: Functions that return values one at a time using yield
Data Analysis Vocabulary
DataFrame: A table of data (like Excel), main structure in Pandas
Series: A single column in Pandas
Index: Row labels in a DataFrame
CSV: Comma Separated Values file format
Excel File (.xlsx): Excel spreadsheet file format
JSON: JavaScript Object Notation, a readable data format
Data Cleaning: Fixing or removing bad data (e.g., null values, duplicates)
Missing Data (NaN): Empty or not available data
Duplicate: Repeated data entries
Merge: Combining two DataFrames based on common columns
Join: SQL-style combination of datasets
GroupBy: Grouping data and performing calculations (sum, mean, count)
Aggregation: Summarizing data (like total sales, average)
Pivot Table: A tool to summarize data by rows and columns
Reshape: Changing the structure of a DataFrame
Filter: Selecting data rows that meet certain conditions
Sort: Arranging data in order (ascending/descending)
Indexing: Accessing specific rows/columns in data
Slicing: Cutting a portion of data
Correlation: Measuring relationship between variables
Outlier: A data point that is very different from others
Skewness: Data that leans left or right
Kurtosis: Measure of whether data has heavy/light tails (extreme values)
Standard Deviation: Measure of how spread out numbers are
Variance: Square of standard deviation
Normalization: Scaling data to a standard range (0 to 1)
Standardization: Scaling data to have mean=0 and std=1
Histogram: Graph showing frequency distribution
Scatter Plot: Graph showing relationship between two variables
Box Plot: Graph showing data distribution with median, quartiles, and outliers
Bar Chart: Graph using bars to represent data values
Line Chart: Graph using lines to show trends over time
EDA: Exploratory Data Analysis - Analyzing data to find patterns
Feature: A column in data
Target Variable: The output you want to predict
Train/Test Split: Dividing data for model training and testing
Overfitting: When a model learns too much detail (bad for predictions)
Underfitting: When a model is too simple and doesn't learn enough
Model: A mathematical formula created to analyze or predict data
Machine Learning: Teaching computers to learn patterns from data
Algorithm: A set of steps to solve a problem (e.g., Linear Regression)
Common Python Data Libraries
Pandas: For data manipulation and analysis
NumPy: For numerical computations and arrays
Matplotlib: For data visualization (charts, plots)
Seaborn: For beautiful statistical plots
Scikit-learn: For machine learning models
Statsmodels: For statistical analysis
OpenPyXL: For working with Excel files
SQLAlchemy: For database connections