天天看點

讀書筆記:Fuxi: a Fault-Tolerant Resource Management and Job Scheduling System at Internet Scale

    此文是關于阿裡雲伏羲平台的論文,一些感興趣的點:

    fuxi:a resouce management and job scheduling system. (我感覺是基于yarn做的,很像yarn)

        1, an incremental resource management protocol

        2, a user-transparent failure recovery

        3, a  effective (faulty-node) detection mechanism and a mlti-level blacklisting schema

    fuxi (fuximaster, appmaster, tubo) <>yarn(resourcemanager, appmaster, nodemanager)

    fuxi 與 yarn差別:

        1,fuxi seperates the notion of task(the application process  that performs the actual work) and container(the unit of resource grant). once an application master receives an grant , it explicitly controls its life-cycle and may reuse the container to run multiple tasks.

        2,lcality tree based scheduling. 

繼續閱讀