Data Processing with Python and R
1. Introduction to Programming with Python
Python is a high-level, general-purpose programming language known for its simplicity and
versatility. It is widely used for data processing, analysis, and visualization. Below are the
foundational topics:
1.1 Basic Language Structures in Python
• Data Types:
• Primitive: int, float, str, bool
• Composite: list, tuple, dict, set
• Basic Operations:
• Arithmetic (+, -, *, /, //, %)
• Relational (==, !=, <, >, <=, >=)
• Logical (and, or, not)
• Control Structures:
• Conditional Statements: if, elif, else
• Loops:
• for: Iterates over a sequence.
• while: Executes as long as a condition is true.
• Functions:
• Definition: def function_name(parameters):
• Return values with return
• Example:
def add(a, b):
return a + b
• Modules:
• Importing libraries: import math, from random import randint
• Reusing code from external Python files.
2. Data Acquisition and Presentation
2.1 Acquiring Data
1. Local Data:
• File operations: Reading and writing files.
with open("data.txt", "r") as file:
data = file.read()
2. Network Data:
• Fetching web data using libraries like requests.
import requests
response = requests.get("http://example.com/data")
print(response.text)
2.2 Data Structures in Python
1. Sequences:
• Strings: Immutable sequences of characters.
• String slicing: text[0:5]
• Lists: Mutable ordered collections.
• Example: my_list = [1, 2, 3]
• Tuples: Immutable ordered collections.
• Example: my_tuple = (1, 2, 3)
2. Basic Data Presentation:
• Example: Reading a CSV file and presenting data in tabular format.
3. Data Visualization Libraries in Python
3.1 Matplotlib
• Plotting Basic Graphs:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()
• Customizations:
• Titles, labels, legends, colors, and line styles.
3.2 Image Processing
• Using Pillow for image manipulation.
from PIL import Image
img = Image.open("example.jpg")
img.show()
4. Powerful Data Structures and Python Extension Libraries
4.1 Dictionaries and Sets
• Dictionaries: Key-value pairs.
my_dict = {"key1": "value1", "key2": "value2"}
• Sets: Unordered collections of unique elements.
my_set = {1, 2, 3, 4, 4}
4.2 NumPy for Arrays
• ndarray: Efficient array structure for numerical data.
import numpy as np
arr = np.array([1, 2, 3])
4.3 Pandas for Series and DataFrames
• Series: One-dimensional labeled data.
import pandas as pd
series = pd.Series([1, 2, 3], index=["a", "b", "c"])
• DataFrame: Two-dimensional labeled data.
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
5. Data Statistics and Mining
5.1 Data Cleaning
• Handling missing values:
df.fillna(0, inplace=True)
• Removing duplicates:
df.drop_duplicates(inplace=True)
5.2 Data Exploration
• Basic statistics:
df.describe()
• Correlation:
df.corr()
5.3 Data Analysis Using Pandas
• Grouping data:
df.groupby("column_name").mean()
• Filtering data:
df[df["column_name"] > 10]
6. Object Orientation and GUI in Python
6.1 Object-Oriented Programming
• Key Concepts:
• Abstraction: Hiding details to simplify usage.
• Inheritance: Creating new classes from existing ones.
• Encapsulation: Bundling data with methods.
• Example:
class Animal:
def __init__(self, name):
self.name = name
class Dog(Animal):
def bark(self):
return f"{self.name} says Woof!"
6.2 GUI with Python
• Using Tkinter for GUI applications:
import tkinter as tk
root = tk.Tk()
label = tk.Label(root, text="Hello, World!")
label.pack()
root.mainloop()
7. Introduction to R for Data Processing
7.1 Basics of R
• Data Types: Numeric, character, logical, factor, and vector.
• Basic Operations:
• Arithmetic: +, -, *, /
• Relational: >, <, ==, !=
• Control Structures:
• if, for, while
7.2 Data Structures in R
1. Vectors: One-dimensional array.
vec <- c(1, 2, 3)
2. Data Frames: Tabular data.
df <- data.frame(A = 1:3, B = c("x", "y", "z"))
3. Matrices: Two-dimensional array.
mat <- matrix(1:6, nrow=2)