Hello Techie! Did you hear about
Apache Airflow? Do you know, what is use of Airflow? If you don’t know, no need
to worry. You are at right place.
In this post,
we will demonstrate the working of Apache Airflow. Here we are not going to discuss the installation of Apache Airflow. So primarily we will go
through the introduction then will focus on “How we can use the Apache Airflow?”.
So, let’s start….
Apache
Airflow is an open-source workflow automation and scheduling platform. It is used for data pipeline model building
tool and Similar to Apache Oozie, Azkaban, and Luigi.
In Apache
Airflow, we create a Dags (Directed Acyclic Graphs) by using the python code.
The Dags includes below information:
·
Configuration file to outline HOW to
execute a task
·
Contains a collection of tasks
·
Determines the order of execution
·
Time of implementation
The above
DAG diagram shows the order of execution of tasks. The task A will execute
first then after other tasks will execute according to flow. According to our
business requirement we can change the order of execution of tasks.
Before going
to dive into python programming part, let’s take a glance about the Apache
Airflow console.
So far, we have
gotten some basic idea about the Airflow. If you did not get anything from the
above discussion, then forget it. From here we will learn practically about
Apache Airflow. Here we are assuming that you are using Linux machine or
instance for the Apache Airflow.
Airflow
Commands:
airflow
-version
#To
initiate the database where Airflow saves the workflows and their states
airflow
initdb
#To check
the list of dags, it will show only dags id
airflow
list_dags
#To check
the list of tasks for specific dags id
airflow
list_tasks <dag_id>
Operator
An operator describes a single task in a
workflow. Operators determine what executes when your DAG runs. Airflow
provides operators for many common tasks, including: –
- BashOperator – executes a bash
command
- PythonOperator – calls an
arbitrary Python function
- EmailOperator – sends an email
- SimpleHttpOperator – sends an
HTTP request
- MySqlOperator, SqliteOperator, PostgresOperator, MsSqlOperator, OracleOperator,
JdbcOperator, etc. – executes a SQL command
- Sensor – waits for a certain time, file, database row, S3 key, etc. In addition to these basic building blocks, there are many more specific operators: DockerOperator, HiveOperator, S3FileTransformOperator, PrestoToMysqlOperator, SlackOperator
Here we
discuss only BashOperator and PythonOperator, if you want to use other than
these two operators then you can follow the Apache Airflow official document(https://airflow.apache.org/_modules/index.html)
Firstly, open the terminal where your have
installed the Apache Airflow and go the Airflow directory, you can able to see
the below sub-directories
Then go the dags directory and create a
python file (file name should be anything, whatever you want) with .py extension
and write the code in this file.
Example 1: (Hello_World.py)
In this example, we will use BashOperater
for tasks.
0 comments:
Post a Comment