0% found this document useful (0 votes)
1 views5 pages

Python Data Analysis Vocabulary List

Uploaded by

CD Monib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views5 pages

Python Data Analysis Vocabulary List

Uploaded by

CD Monib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Python & Data Analysis Vocabulary List

Python Vocabulary
Variable: A name that stores a value (e.g., x = 5)

Data Type: The kind of data (string, integer, float, boolean, etc.)

String: Text data inside quotes (e.g., 'Hello')

Integer (int): Whole numbers like 5, 100

Float: Decimal numbers like 3.14

Boolean: True or False values

List: A collection of items inside [] (e.g., [1, 2, 3])

Tuple: Like a list, but cannot change (immutable), inside ()

Dictionary (dict): Data stored as key-value pairs {key: value}

Set: A collection of unique items (no duplicates), inside {}

Operator: Symbols like +, -, *, / used in calculations

Conditional Statement: Code that makes decisions using if, elif, else

Loop: Code that repeats (e.g., for loop, while loop)

Function (def): Reusable block of code, defined using def

Argument / Parameter: Values passed into a function

Return: Gives back a result from a function

Indentation: Spaces or tabs used to structure Python code

Class: A blueprint for creating objects

Object: An instance of a class

Attribute: A variable inside a class/object

Method: A function inside a class

Module: A Python file containing functions/classes

Package: A collection of Python modules

Import: Bringing in external code using import

Library: A collection of ready-made code for specific tasks (e.g., Pandas, NumPy)

Exception: An error detected during program execution


Try-Except Block: Handling errors safely

Lambda Function: A small anonymous function

List Comprehension: A quick way to create lists with a loop in one line

Recursion: A function calling itself

Decorator: A function that adds extra features to another function

Iterable: Any object you can loop over (like list, tuple)

Iterator: An object that remembers its place during iteration

Generator: Functions that return values one at a time using yield


Data Analysis Vocabulary
DataFrame: A table of data (like Excel), main structure in Pandas

Series: A single column in Pandas

Index: Row labels in a DataFrame

CSV: Comma Separated Values file format

Excel File (.xlsx): Excel spreadsheet file format

JSON: JavaScript Object Notation, a readable data format

Data Cleaning: Fixing or removing bad data (e.g., null values, duplicates)

Missing Data (NaN): Empty or not available data

Duplicate: Repeated data entries

Merge: Combining two DataFrames based on common columns

Join: SQL-style combination of datasets

GroupBy: Grouping data and performing calculations (sum, mean, count)

Aggregation: Summarizing data (like total sales, average)

Pivot Table: A tool to summarize data by rows and columns

Reshape: Changing the structure of a DataFrame

Filter: Selecting data rows that meet certain conditions

Sort: Arranging data in order (ascending/descending)

Indexing: Accessing specific rows/columns in data

Slicing: Cutting a portion of data

Correlation: Measuring relationship between variables

Outlier: A data point that is very different from others

Skewness: Data that leans left or right

Kurtosis: Measure of whether data has heavy/light tails (extreme values)

Standard Deviation: Measure of how spread out numbers are

Variance: Square of standard deviation

Normalization: Scaling data to a standard range (0 to 1)

Standardization: Scaling data to have mean=0 and std=1

Histogram: Graph showing frequency distribution


Scatter Plot: Graph showing relationship between two variables

Box Plot: Graph showing data distribution with median, quartiles, and outliers

Bar Chart: Graph using bars to represent data values

Line Chart: Graph using lines to show trends over time

EDA: Exploratory Data Analysis - Analyzing data to find patterns

Feature: A column in data

Target Variable: The output you want to predict

Train/Test Split: Dividing data for model training and testing

Overfitting: When a model learns too much detail (bad for predictions)

Underfitting: When a model is too simple and doesn't learn enough

Model: A mathematical formula created to analyze or predict data

Machine Learning: Teaching computers to learn patterns from data

Algorithm: A set of steps to solve a problem (e.g., Linear Regression)


Common Python Data Libraries
Pandas: For data manipulation and analysis

NumPy: For numerical computations and arrays

Matplotlib: For data visualization (charts, plots)

Seaborn: For beautiful statistical plots

Scikit-learn: For machine learning models

Statsmodels: For statistical analysis

OpenPyXL: For working with Excel files

SQLAlchemy: For database connections

You might also like