Logo

Simple Heuristics for Scheduling Apache Airflow: A Case Study at PT. X

HANS, and Bisono, Indriati Njoto and SOEWANDI, HANIJANTO (2022) Simple Heuristics for Scheduling Apache Airflow: A Case Study at PT. X. [UNSPECIFIED]

[img] PDF
Download (687Kb)
    [img] PDF
    Download (2037Kb)
      [img]
      Preview
      PDF (paper - Indriati N)
      Download (2643Kb) | Preview

        Abstract

        Abstract. In the big data era, there will be a lot of data that needs to be handled properly for making well-informed business decisions. As a result, the data needs to get updated regularly for relevant analysis. Apache Airflow is commonly utilized to schedule the data update by running the query through sequences of DAG tasks. Unfortunately, scheduling DAG tasks is categorized as NP-Hard problem which is difficult to obtain the global optimum solution. This paper shows a simple heuristic algorithm to solve the subset of DAG tasks at PT.X using 2 computers (virtual machines). This is a special case of P_m|prec|C_max problem where there are 2 computers are being used. Before constructing the algorithm, CPM (Critical Path Method) is used as the baseline to get the most optimal solution. A Knapsack problem is then used to add additional tasks for virtual machine #1, and a heuristic for virtual machine #2 as a compromise to balance the cost and time of completion. The characteristics of this simple algorithm is discussed with numerical examples/experiments. Keywords: Heuristics, Apache Airflow, Directed Acyclic Graphs, Critical Path Method.

        Item Type: UNSPECIFIED
        Subjects: H Social Sciences > HD Industries. Land use. Labor > HD28 Management. Industrial Management
        Divisions: Faculty of Industrial Technology > Industrial Engineering Department
        Depositing User: Admin
        Date Deposited: 23 Sep 2022 17:44
        Last Modified: 03 Sep 2024 11:15
        URI: https://repository.petra.ac.id/id/eprint/20046

        Actions (login required)

        View Item