0% found this document useful (0 votes)
24 views

2.2 create an Airflow DAG to read and write files using the PythonOperator

This document provides a step-by-step guide to creating an Airflow DAG for reading and writing files using the PythonOperator. It includes instructions for setting up the Airflow environment, creating a DAG with tasks to read from and write to CSV files, and running the DAG. Additional tips for file handling and task dependencies are also included.

Uploaded by

rg.techprojects
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

2.2 create an Airflow DAG to read and write files using the PythonOperator

This document provides a step-by-step guide to creating an Airflow DAG for reading and writing files using the PythonOperator. It includes instructions for setting up the Airflow environment, creating a DAG with tasks to read from and write to CSV files, and running the DAG. Additional tips for file handling and task dependencies are also included.

Uploaded by

rg.techprojects
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

2.

2 create an Airflow DAG to read and write files using the PythonOperator:

create an Airflow DAG to read and write files using the PythonOperator:

✅ Step-by-Step Guide to Airflow File Processing

1. Setup Airflow Environment

 Ensure Airflow is installed (pip install apache-airflow).

 Initialize Airflow DB:

bash

CopyEdit

airflow db init

 Start Airflow Scheduler & Webserver:

bash

CopyEdit

airflow scheduler & airflow webserver

 Access the Airflow UI at http://localhost:8080.

2. Create the Airflow DAG

1. Navigate to the DAGs folder:

bash

CopyEdit

cd ~/airflow/dags

2. Create a new DAG file, e.g., file_processing_dag.py:

python

CopyEdit

from airflow import DAG

from airflow.operators.python import PythonOperator

from datetime import datetime

import pandas as pd
# Define default args

default_args = {

'owner': 'airflow',

'start_date': datetime(2024, 1, 1),

'retries': 1

# Define DAG

with DAG('file_processing_dag',

default_args=default_args,

schedule_interval='@daily',

catchup=False) as dag:

# Task 1: Read file

def read_file():

data = pd.read_csv('/path/to/input.csv')

print(data.head())

# Task 2: Write file

def write_file():

data = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [24, 27, 22]

})

data.to_csv('/path/to/output.csv', index=False)

print("File written successfully.")


# Define Python Operators

read_task = PythonOperator(

task_id='read_csv_file',

python_callable=read_file

write_task = PythonOperator(

task_id='write_csv_file',

python_callable=write_file

# Set task dependencies

read_task >> write_task

3. Run the DAG

1. Place your input file at the specified path (/path/to/input.csv).

2. Refresh the Airflow UI and trigger the DAG.

3. Check logs for file read/write confirmation.

💡 Tips:

 Ensure correct file paths and permissions.

 Use XComs if you want to pass data between tasks.

 Explore BashOperator for shell commands if needed

You might also like