0% found this document useful (0 votes)

0 views

Data Structuring & Data Gathering 1

The document discusses the importance of data gathering and structuring in machine learning, emphasizing that well-structured data is crucial for accurate modeling and insights. It outlines various methods for data collection, types of data, challenges faced, and the steps involved in structuring data for effective analysis. Additionally, it highlights the significance of timestamps and feature engineering in enhancing model performance.

Uploaded by

s6307823

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Data Structuring & Data Gathering 1

Uploaded by

s6307823

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Data structuring &

Data Gathering
By Deepti
Introduction & Motivation
Talking Points:
• “Most of the hard work in machine learning isn’t the
modeling. It’s in collecting and preparing the data.”
• Poorly structured or noisy data leads to misleading
results, model drift, and failed predictions.
• Structured data is the foundation for successful ML—
just like good soil is critical for growing plants.
What is Data Gathering?
•The process of collecting raw data from relevant sources.
•In ML, data can be manually labeled, logged automatically, or collected via APIs, databases,
sensors, etc.
•Examples:
• User interaction logs from a mobile app
• Sensor data from IoT devices
• Survey forms, transaction logs, e-commerce activity
What is Data Gathering?
Data gathering is the process of collecting raw information from various sources for analysis.
It can come from:
• Internal systems (like databases or logs)
• External sources (APIs, web scraping, public datasets)
• User inputs (forms, surveys, feedback)
•
Sensors or devices (IoT)The goal is not just to collect a lot of data — but to collect the right data.
⸻
Why Data Gathering Matters
Poor data leads to poor insights. The phrase “garbage in, garbage out” is very real in data
science. Accurate, relevant, and timely data ensures:
• Better models
• Smarter decisions
• Increased trust in your results

Types of Data

During gathering, we deal with:

•Structured data: Numbers, categories, tables — easy to store in databases.
•Unstructured data: Text, images, videos, social media content — harder to manage but
extremely valuable.
•Semi-structured data: JSON, XML — somewhere in between.
Common Data Gathering Methods

Let’s go through some key methods:

• Manual collection – Surveys, interviews, feedback forms
• Automated systems – Log files, transactions, sensors
• APIs – Pulling data from platforms like Twitter, Google, or public services
• . Web scraping – Extracting data from websites (ethically and legally) .
• Third-party datasets – Kaggle, government portals, research institutions

Challenges in Data Gathering

•Incomplete or missing data
•Noisy or irrelevant information
•Data privacy and legal constraints
•Access limitations or API quotas
1. You're tasked with predicting customer churn for a subscription service. The CRM system
logs events like logins, payments, and cancellations. What's the best first step to gather
useful data for your model?
A. Train a churn model using only recent cancellation data
B. Export the event logs and structure them into user-level summaries
C. Ask customer support for anecdotal reasons why people churn
D. Use social media sentiment to guess why users are unhappy

Ans : b

2. You're building a recommendation system for an e-commerce app. The product team asks
you to include user interaction logs. The logs contain time stamps but are in different time
zones. What should you do before using the data?
B. Ignore time zones since timestamps are relative
B. Convert all timestamps to UTC or a consistent time zone
C. Use the raw timestamps as-is
D. Remove timestamps entirely to simplify the dataset
3. What is Data Structuring?
Slide: Structured vs Unstructured
Talking Points:
•Data structuring is converting raw, messy data into organized formats:
tables, arrays, JSON, etc.
•Essential for enabling feature extraction, analysis, and feeding into ML
models.
•Types of structuring:
• Tabular structuring
• Time-series transformation
• Categorical encoding
•Structured data improves reproducibility, automation, and traceability.
What is Data Structuring?
Data structuring is the process of organizing, cleaning, and formatting data so that it can be
easily accessed and analyzed.Think of it like cleaning and sorting ingredients before you
cook. You may have all the right data — but if it’s not structured, it’s hard to work with.

Why Data Structuring Matters ?

Structured data:
•Improves data quality and consistency
•Enables faster and more accurate analysis
•Reduces errors in models and dashboards
•Makes collaboration and data sharing easier
Common Issues in Raw Data :
•Missing values •Duplicate records •Inconsistent formats (e.g., “NY” vs.
“New York”) •Outliers or invalid entries •Unnecessary or redundant fields
Key Steps in Structuring Data
• Cleaning – Remove duplicates, fill or drop missing values, fix typos.
• Formatting – Standardize dates, currencies, categories, etc.
• Transforming – Convert columns, normalize values, create new features.
• Labeling – Add headers, metadata, or classification tags.
• Storing – Organize in tables, files, or databases that are easy to access.
Structured vs. Unstructured Data
Let’s quickly compare:
• Structured: Excel tables, SQL databases, CSV filesEasy to search, filter,
and model.
• Unstructured: PDFs, emails, images, audioHarder to organize, but still
valuable — often structured later using NLP, OCR, et
Tools for Structuring DataSome popular tools
• Excel/Google Sheets – great for small datasets
• Python (Pandas, NumPy) – powerful for automation and cleaning
• SQL – ideal for relational data
• ETL tools – like Talend, Apache NiFi, or Alteryx
• Cloud platforms – AWS, Azure, GCP offer data pipelines

Real-World Example
Suppose we collected e-commerce data with thousands of transactions.
We’d need to:
•Standardize product names
•Remove incomplete records
•Convert all currencies to USD
•Split full names into first and last names
•Store everything in a consistent schema (like “date | product | price | customer
ID”)Only then can we perform reliable analysis or feed it into ML models.
1. You're analyzing task logs for productivity trends. Each row contains a task description,
user ID, and timestamp, but no clear task categories. What should you do to make this
dataset usable for supervised ML?
A. Remove the task description column since it’s unstructured
B. Create a new "task category" feature by applying text classification or manual tagging
C. Ignore categories and treat all tasks as the same
D. Randomly assign category labels to balance the dataset
ans : b

2. Your dataset has timestamps stored as strings in multiple formats (e.g., “05/31/25”,
“2025-05-31T14:00”). What should be your first step in structuring this data for ML
modeling?
A. Use the strings as-is since the date is visible
B. Drop all inconsistent rows
C. Standardize all timestamps to a common format and convert to datetime objects
D. Extract only the year from the strings for simplicity
Ans c
Task Logs as Datasets
Task logs are chronological records of actions taken in a system—each line is an
event.
•Contains rich behavioral data:
• Who did what?
• When was it done?
• What kind of task was it?
• What was the outcome?
•In ML:
• Logs can be used to build classification, prediction, and recommendation
models.
• Examples: Predict next action, flag anomalies, estimate completion time.
Use Case: Project management system predicting delays based on task history.
• [Task ID] | [User] | [Action] | [Category] | [Time Stamp] | [Status]
What is a Task Log?
A Task Log is a structured record of the work being done in a project.
It keeps track of:

•What was done

•Who did it
•When it was done
•Why it was done

Why Maintain a Task Log?

There are several key reasons:
•Transparency – Everyone knows what’s happening
•Accountability – Each task has an owner
•Reproducibility – You or someone else can retrace your steps later
•Progress Tracking – Helps in managing deadlines and deliverables
•Error Handling – Easier to trace back where something went wrong
What to Include in a Task Log?
A simple task log might include:

•Task name or ID
•Description of the task
•Assigned person
•Start and end date
•Status (Pending, In Progress, Completed)
•Notes or observations
•Next steps (if any)
Importance of Time Stamps in ML
Slide: Time-Based Features in Machine Learning
•Timestamps provide a temporal context to data.
•Essential for time-series models, sequential pattern recognition, churn prediction.
•Can be used to derive:
• Task duration
• Task frequency
• Gaps between tasks
• Peak activity hours
•Considerations:
• Time zones
• Missing values
• Consistent formats (ISO 8601, Unix)
• Granularity (second/minute/hour)
ML Models That Use Time:
•LSTM
•ARIMA
•Prophet
•Transformer-based models
What is a Timestamp?
A timestamp is a sequence of characters or encoded information that identifies when a certain
event occurred.In datasets, it usually represents the date and time something happened — like
when a user logged in, when a transaction was made, or when a sensor recorded data
Examples:
•2025-05-31 10:15:30
•31/05/2025 10:15 AM
•May 31, 2025 – 10:15

Why Timestamps Matter?

•Track behavior over time (e.g., customer login patterns)
•Analyze trends (e.g., weekly sales, peak hours)
•Build time-based features (e.g., day of the week, hour of the day)
•Work with time series models (e.g., forecasting demand)
•Ensure data integrity (e.g., correct sequencing of events)
Common Timestamp Formats

Timestamps can appear in many formats, such as:

•ISO 8601: 2025-05-31T10:15:30Z

•UNIX Timestamp: 1748691330 (seconds since Jan 1, 1970)
•Custom formats: 31-May-2025 10:15 AM
When working with timestamps, standardizing the format is key to avoid
errors.
1. You’re working with a system that logs user activities like "Login", "Create Task", "Submit File", and "Logout".
Each log includes a timestamp and user ID. You need to predict user churn. What's your first structuring step?
A. Remove repetitive actions from logs
B. Aggregate logs into user sessions and extract features like session length and action frequency
C. Keep logs as-is and apply linear regression
D. Randomly sample logs for faster processing
Ans:b

2. Your task logs are collected from multiple microservices and stored in separate files. Some actions (e.g.,
“Approve Task”) span multiple services. What’s the best approach to make the logs useful for ML?
B. Analyze one microservice log at a time
B. Join the logs based on common task IDs and order them chronologically
C. Randomly merge logs
D. Drop entries with missing service info
Ans B
Data Structuring Workflow for ML

Slide: Step-by-Step Pipeline

Workflow Steps:
1.Data Ingestion – From logs, sensors, APIs
2.Data Cleaning – Remove duplicates, nulls, inconsistencies
3.Data Normalization – Format timestamps, unify category names
4.Feature Engineering – Time since last task, task density, encoded categories
5.Splitting & Labeling – For training, validation, testing
Tools Mentioned:
•pandas, numpy, datetime, scikit-learn
•ETL tools: Apache Airflow, Talend
•Data validation: great_expectations, pandas-profiling
Data Ingestion
This is where we bring the data into our environment.Sources may include:
•CSV/Excel files
•Databases
•APIs
•Cloud storage
•Web scrapingTools: pandas.read_csv(), SQL connectors, ETL platforms

Data Inspection
Before touching the data, we need to understand its structure and quality.
•View sample rows
•Check data types
•Identify missing or inconsistent values
•Look at value distributions
Example: Use df.head(), df.info(), df.describe() in Python
Data Cleaning
“This is where we remove or fix problems in the raw data”
•Handle missing values
•Correct typos or anomalies
•Remove duplicates
•Standardize formats (dates, currencies, categories)Cleaning ensures your data is
accurate, consistent, and complete.
Data Transformation
Now we reshape and enhance the data for analysis or modelling
•Convert data types
•Normalize or scale numeric values
•Encode categorical data (e.g., One-Hot Encoding)
•Create new features (e.g., extracting “month” from a date)This step adds
analytical value to your dataset.
Data Integration

If you have data from multiple sources, you’ll need to merge or join them
•Combine tables based on keys (e.g., user ID, transaction ID)
• Ensure consistent naming and units
•Resolve conflicts in overlapping fieldsTools: pd.merge(), SQL JOIN, Power BI or Tableau
connectors.

Data Validation
Once transformed and integrated, we validate the dataset:
•Run sanity checks (e.g., no negative sales)
•Check column ranges and uniqueness
•Ensure row counts match expectationsValidation helps catch unnoticed errors before
analysis or modeling.
Data Storage
Finally, store the structured data for use:
• Save as CSV, JSON, or Parquet files
• Load into databases
• Push to cloud storage (e.g., S3, Azure Blob)Choose a format that aligns with
your workflow and team needs.

What is Feature Engineering?

Feature engineering is the process of:
•Creating new input variables (features)
•Transforming existing features
•Selecting the most relevant features.
That help machine learning models perform better.
“It’s like crafting the right questions to get better answers from your data.”
: Why Feature Engineering Matters
Feature engineering helps:
•Improve model accuracy
•Reduce overfitting
•Reveal patterns and relationships
•Simplify complex data
Without meaningful features, even the most advanced ML model won’t perform well.
Common Feature Engineering Techniques
1.Binning – Grouping continuous data (e.g., age into age groups)
2.Encoding – Converting categories into numbers (e.g., One-Hot, Label Encoding)
3.Normalization/Scaling – Bringing numeric features to the same range
4.Date/Time Extraction – From timestamps to day, hour, or season
5.Text Features – Word count, sentiment score, TF-IDF
6.Interaction Features – Multiplying or combining two variables
7.Log Transformation – Reducing skewness in numerical features
Splitting and Labeling
“What is Splitting?”
Splitting is the process of dividing your dataset into:
•Training Set – Data used to train the model
•Validation Set – (Optional) For tuning hyperparameters
•Test Set – Data used to evaluate the final model
A common split ratio is:
•70% Training
•15% Validation
•15% Testing.
Use train_test_split() in Python (from sklearn) to do this.
Why Splitting is Important?
Without splitting, the model might “memorize” the data and perform well only on
what it has already seen.This leads to overfitting — good performance on training
data, poor performance in the real world.
What is Labeling?
Labeling is the process of identifying the target variable — the value you want the model to
predict.
Examples:
•In a spam detection model:
•Input = Email text
•Label = Spam or Not Spam
•In a sales prediction model:
•Input = Product, Region, Month
•Label = Sales amountAccurate and clean labels are essential for supervised learning.
Best Practices for Splitting & Labeling
•Always randomize before splitting
•Use stratified splitting for imbalanced classes (e.g., fraud detection)
•Ensure no data leakage — don’t use future data to predict the past
Keep labels separate from features in your processing pipeline

What is ETL?
The ETL process involves three major steps
1.Extract – Pulling data from various sources (databases, APIs, files)
2.Transform – Cleaning, formatting, and structuring data
3.Load – Moving the transformed data into a destination like a database, data
warehouse, or data lake.
ETL makes raw data usable by converting it into a structured, consistent format.
Case Study Example

Slide: Mini Case Study – Predicting Employee Overload from Task Logs
Scenario:
•Input: Task logs from an internal tool
•Structured features:
• Number of tasks/day
• Avg task completion time
• Category spread
• Time active per day
•Model: Random Forest Classifier to predict overload risk
Outcome:
•Model helps assign support proactively
1)Your company wants to analyze productivity trends from an internal project management tool. It logs
all tasks with timestamps, user actions, and task categories. However, data is spread across multiple
inconsistent Excel sheets. What’s the best first step in data gathering?
A. Start training a model using one of the Excel files
B. Manually clean each file separately
C. Consolidate the files into a central structured format (e.g., CSV or database) and standardize the
schema
D. Request employees to input their tasks again using a new format
ANS: C

2. You’re reviewing task logs that include timestamped actions: “Start Task,” “Pause,” “Resume,” and
“Complete.” You need to structure this data to predict delays. What’s the best approach?
B. Count the total number of actions per task
B. Calculate actual task durations by pairing relevant timestamp events
C. Focus only on “Start” and “Complete” actions
D. Drop tasks that contain “Pause” or “Resume” events

ANS :B
3. While gathering user activity data for an ML model, you notice some logs are missing timestamps, and others
have them in mixed formats. What’s the correct approach during data structuring?
A. Ignore the missing timestamps—they’re not critical
B. Convert all timestamps to a consistent format (e.g., ISO 8601) and impute missing values based on nearest actions
C. Replace missing timestamps with zeros
D. Use the timestamp column as a text string
✅ Correct Answer: B
Why: Consistent, complete timestamps are crucial for time-based features and modeling.

4. You're building a classifier to categorize incoming tasks based on historical logs. The “Task Type” column has 300
unique values, many of which appear only once. What’s the most efficient way to handle this during structuring?
A. One-hot encode all 300 values
B. Group rare values into “Other” and apply frequency encoding
C. Drop the column
D. Encode all categories as integers from 1 to 300
✅ Correct Answer: B
Why: Grouping and encoding reduces noise and dimensionality while preserving information.

Power Builder Interview Questions and Answers
67% (3)
Power Builder Interview Questions and Answers
34 pages
IEEE 07815845-Lamia Yessad
No ratings yet
IEEE 07815845-Lamia Yessad
6 pages
Introduction to Data Science
No ratings yet
Introduction to Data Science
29 pages
Unit I- Data Science
No ratings yet
Unit I- Data Science
161 pages
02 DP
No ratings yet
02 DP
31 pages
FDS - UNIT 1
No ratings yet
FDS - UNIT 1
233 pages
Unit 1
No ratings yet
Unit 1
26 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
21css303t Datascience Unit 1 Notes (1)
No ratings yet
21css303t Datascience Unit 1 Notes (1)
246 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
32 pages
Multidisciplinary Field That Uses A Variety
No ratings yet
Multidisciplinary Field That Uses A Variety
48 pages
UNIT _ Introduction_DataScience_new (1)
No ratings yet
UNIT _ Introduction_DataScience_new (1)
55 pages
Unit I- Data Science
No ratings yet
Unit I- Data Science
161 pages
Explaratory Data Analysis - Python
No ratings yet
Explaratory Data Analysis - Python
16 pages
FDS NOTES(1).pdf
No ratings yet
FDS NOTES(1).pdf
140 pages
Emerging_CH2
No ratings yet
Emerging_CH2
41 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
IDS_sem ans unit 1
No ratings yet
IDS_sem ans unit 1
10 pages
Chapter Two2
No ratings yet
Chapter Two2
21 pages
Fods Notes
No ratings yet
Fods Notes
139 pages
FDS NOTES
No ratings yet
FDS NOTES
137 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Chapter 2 - Introduction to Data Science (2)
No ratings yet
Chapter 2 - Introduction to Data Science (2)
35 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
Data Science
No ratings yet
Data Science
35 pages
FDSUNIT 1
No ratings yet
FDSUNIT 1
27 pages
AWS ML Notes -Domain 1 - Data Processing
No ratings yet
AWS ML Notes -Domain 1 - Data Processing
37 pages
ETCh2
No ratings yet
ETCh2
36 pages
DA106-Week1-Material
No ratings yet
DA106-Week1-Material
10 pages
Foundation of Data Science
100% (2)
Foundation of Data Science
143 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
57 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
33 pages
DS Unit 1
No ratings yet
DS Unit 1
37 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
ET_Ch-2_Data_Science_ppt (2)
No ratings yet
ET_Ch-2_Data_Science_ppt (2)
28 pages
Chapter-2 Data Science2
No ratings yet
Chapter-2 Data Science2
24 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
80 pages
Chaoter Data Science
No ratings yet
Chaoter Data Science
20 pages
Data+Science+in+Python+ +Data+Prep+&+EDA
No ratings yet
Data+Science+in+Python+ +Data+Prep+&+EDA
196 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
40 pages
CH-2 Data Science Emerging Technology
No ratings yet
CH-2 Data Science Emerging Technology
20 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
unit_1
No ratings yet
unit_1
9 pages
Chapter 2 Data Science1
No ratings yet
Chapter 2 Data Science1
41 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Cs3352 FDS Question Bank
No ratings yet
Cs3352 FDS Question Bank
145 pages
UNIT 1_PPT
No ratings yet
UNIT 1_PPT
67 pages
Unit 2 Data Gathering
No ratings yet
Unit 2 Data Gathering
14 pages
Unit 1-Part3-Compressed
No ratings yet
Unit 1-Part3-Compressed
28 pages
Unit 2 - Data Munging PDF
No ratings yet
Unit 2 - Data Munging PDF
54 pages
Data v2
No ratings yet
Data v2
25 pages
data science
No ratings yet
data science
23 pages
Chapter 2. Introduction to Data Science
No ratings yet
Chapter 2. Introduction to Data Science
41 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Chapter 2 - Intro to Data Sciences[2]
No ratings yet
Chapter 2 - Intro to Data Sciences[2]
41 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
B.tech. 3rd Yr CSE (AI) 2022 23 Revised
No ratings yet
B.tech. 3rd Yr CSE (AI) 2022 23 Revised
35 pages
BDA Presentations
No ratings yet
BDA Presentations
26 pages
Qlib: An AI-oriented Quantitative Investment Platform
No ratings yet
Qlib: An AI-oriented Quantitative Investment Platform
8 pages
Freebie - Top 52 Interview Q&A For SWEs
No ratings yet
Freebie - Top 52 Interview Q&A For SWEs
55 pages
wipro Internship Report
No ratings yet
wipro Internship Report
35 pages
High Performance PostgreSQL for Rails Beta Reliable Scalable Maintainable Database Applications Andrew Atkinson - The latest ebook edition with all chapters is now available
100% (1)
High Performance PostgreSQL for Rails Beta Reliable Scalable Maintainable Database Applications Andrew Atkinson - The latest ebook edition with all chapters is now available
30 pages
25 Questions For Power Bi
No ratings yet
25 Questions For Power Bi
8 pages
Semantic Search With LLMs
No ratings yet
Semantic Search With LLMs
21 pages
SQL Modules
No ratings yet
SQL Modules
1 page
Expert Performance Indexing in Azure SQL and SQL Server 2022, Fourth Edition: Toward Faster Results and Lower Maintenance Both on Premises and in the Cloud Edward Pollack - The complete ebook is available for download with one click
100% (1)
Expert Performance Indexing in Azure SQL and SQL Server 2022, Fourth Edition: Toward Faster Results and Lower Maintenance Both on Premises and in the Cloud Edward Pollack - The complete ebook is available for download with one click
59 pages
Practical File OF SQL: MKM College of Management & Information Technology For Girls, Hodal
No ratings yet
Practical File OF SQL: MKM College of Management & Information Technology For Girls, Hodal
28 pages
SQL Roadmap
No ratings yet
SQL Roadmap
6 pages
k8s HELM
No ratings yet
k8s HELM
18 pages
Big Dataa-Lab-Manual
No ratings yet
Big Dataa-Lab-Manual
24 pages
Database Package and Database Management System
No ratings yet
Database Package and Database Management System
2 pages
Developing Java Enterprise Applications
No ratings yet
Developing Java Enterprise Applications
684 pages
Project Synopsis
No ratings yet
Project Synopsis
6 pages
21cs71 Model Set 1 Paper Solution
No ratings yet
21cs71 Model Set 1 Paper Solution
32 pages
Fundamentals Of Database Systems 6th Edition Elmasri Solutions Manual instant download
100% (2)
Fundamentals Of Database Systems 6th Edition Elmasri Solutions Manual instant download
40 pages
Rutuja Dhanawade Resume
No ratings yet
Rutuja Dhanawade Resume
1 page
Database management SQLite and CRUD-part-IV (2)
No ratings yet
Database management SQLite and CRUD-part-IV (2)
29 pages
DAX Zero To Hero
No ratings yet
DAX Zero To Hero
113 pages
MajorProject_Synopsis Format
No ratings yet
MajorProject_Synopsis Format
5 pages
Practical File Index
No ratings yet
Practical File Index
2 pages
RM Ss Library and Information Science Paper I II
No ratings yet
RM Ss Library and Information Science Paper I II
17 pages
Asterisk & Freepbx: Huy Nguyen
No ratings yet
Asterisk & Freepbx: Huy Nguyen
41 pages
Airline_Reservation_Practical_File_Final
No ratings yet
Airline_Reservation_Practical_File_Final
8 pages
Hassan Raza Test
No ratings yet
Hassan Raza Test
4 pages

Data Structuring & Data Gathering 1

Uploaded by

Data Structuring & Data Gathering 1

Uploaded by

Data structuring &

During gathering, we deal with:

Let’s go through some key methods:

Challenges in Data Gathering

Why Data Structuring Matters ?

•What was done

Why Maintain a Task Log?

Why Timestamps Matter?

Timestamps can appear in many formats, such as:

•ISO 8601: 2025-05-31T10:15:30Z

Slide: Step-by-Step Pipeline

What is Feature Engineering?

You might also like