Parent: Production grade airflow scripting


Why?

The reason you do this is to get a clear distinction that Airflow is just an orchestrator and should be separated from the business logic.

The golden rule

Your DAG file should read like a workflow diagram, not like an implementation. All it should contain is what happens and when it happens

Tasks should be made lean, clean, and easy to read. Part of the reason is because dag files get read multiple times while its processed. Putting too much in your DAG’s cause overhead for the DAG Processor.

How?

This note has a prerequisite: Creating python modules

The gold standard for a task looks like this

from compacter.sql_utils import get_pending_sql_keys

	@task
	def get_keys(distrik,limit):
		engine = MsSqlHook(conn_id='mssql_creds').get_sqlalchemy_engine()
		return get_pending_sql_keys(distrik,limit)

Very lean, no logic involved, just tells you what gets ran and when.

read how to grab dag params here Accessing Airflow Params

Resources: