Data Modeling Featurization Visualization
Data Modeling Featurization Visualization
Definition:
Data modeling is the process of creating a mathematical or logical structure to represent data and its
Tools/Libraries:
- R: caret, glm
Example:
model = LinearRegression()
model.fit(X, y)
2. What is Featurization?
Definition:
Featurization is the process of converting raw data into meaningful input features that can be used
Tools/Libraries:
- pandas - for data manipulation
Examples:
Numerical Scaling:
import numpy as np
scaler = MinMaxScaler()
print(scaler.fit_transform(data))
Text to Features:
vectorizer = CountVectorizer()
print(vectorizer.fit_transform(text).toarray())
Definition:
Data visualization is the graphical representation of information and data. It helps to understand
Tools/Libraries:
- matplotlib
- seaborn
- plotly
Example:
x = [1, 2, 3, 4]
plt.plot(x, y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
Summary Table
|----------------|---------------------------------------|------------------------------|----------------------------------|
sales or outcomes |
| Featurization | Converting raw data into features | pandas, sklearn, NLTK, spaCy | Scaling,
distribution analysis |