Logo

Simple Heuristics for Scheduling Apache Airflow: A Case Study at PT. X

HANS, and Bisono, Indriati Njoto and SOEWANDI, HANIJANTO (2022) Simple Heuristics for Scheduling Apache Airflow: A Case Study at PT. X. [UNSPECIFIED]

[img] PDF
Download (687Kb)
    [img] PDF
    Download (2037Kb)

      Abstract

      Abstract. In the big data era, there will be a lot of data that needs to be handled properly for making well-informed business decisions. As a result, the data needs to get updated regularly for relevant analysis. Apache Airflow is commonly utilized to schedule the data update by running the query through sequences of DAG tasks. Unfortunately, scheduling DAG tasks is categorized as NP-Hard problem which is difficult to obtain the global optimum solution. This paper shows a simple heuristic algorithm to solve the subset of DAG tasks at PT.X using 2 computers (virtual machines). This is a special case of P_m|prec|C_max problem where there are 2 computers are being used. Before constructing the algorithm, CPM (Critical Path Method) is used as the baseline to get the most optimal solution. A Knapsack problem is then used to add additional tasks for virtual machine #1, and a heuristic for virtual machine #2 as a compromise to balance the cost and time of completion. The characteristics of this simple algorithm is discussed with numerical examples/experiments. Keywords: Heuristics, Apache Airflow, Directed Acyclic Graphs, Critical Path Method.

      Item Type: UNSPECIFIED
      Subjects: H Social Sciences > HD Industries. Land use. Labor > HD28 Management. Industrial Management
      Divisions: Faculty of Industrial Technology > Industrial Engineering Department
      Depositing User: Admin
      Date Deposited: 23 Sep 2022 17:44
      Last Modified: 04 Apr 2023 15:34
      URI: https://repository.petra.ac.id/id/eprint/20046

      Actions (login required)

      View Item