7-1-1. Splitter

The Splitter plays a crucial role in encrypting executable jobs where possible and distributing training data to nodes. The following are the main roles and process flow of the Splitter:

  1. Calling the Splitter corresponding to the ProgramId and dividing the dataset The ProgramId is a unique identifier for the program, and the Splitter selects the appropriate subprogram based on this identifier. The Splitter converts the given dataset into an optimal format for machine learning algorithms and divides it according to the NumParallel. NumParallel indicates the number of subprograms to be processed in parallel, and the dataset is evenly divided based on this number.

  2. Dividing Jobs and registering SubJobs to EMETH Core (L1) / EMETH Execution Layer (L2) The Splitter divides the Job corresponding to the split dataset into multiple SubJobs. Each SubJob is assigned a unique ID, maintaining its relationship with the parent Job. The divided SubJobs are registered with EMETH Core (L1) / EMETH Execution Layer (L2), preparing for distributed processing. The ProgramId differs between parent Jobs and SubJobs, expressing the hierarchical structure of the program. (Example: ProgramId:1 = Parent Job processing for overall GPT3 training, ProgramId:11 = SubJob processing for ProgramId:1)

The Splitter skillfully divides programs and jobs, applying confidential encryption to protect data confidentiality while enabling efficient distributed processing. This functionality allows for fast and secure processing even when dealing with large-scale datasets.

The role of the Splitter is extremely important in EMETH's architecture. The division and distribution of data form the foundation for parallel processing, significantly improving overall performance. Furthermore, the protection of data through confidential encryption is essential from privacy and security perspectives.

Last updated