7-1-2. Splitter

Splitter plays a crucial role in encrypting jobs to be executed and distributing learning data to nodes. Here are the main roles and flow of how the Splitter works.

  1. Calling the Splitter corresponding to the ProgramId and splitting the dataset: The ProgramId is a unique identifier of the program, and Splitter selects the appropriate sub-program based on this identifier. The Splitter converts the given dataset into a format optimal for the machine learning algorithm and splits it according to the NumParallel number. NumParallel indicates the number of sub-programs to be processed in parallel, and the dataset is evenly divided based on this number.

  2. Dividing the Job and registering the SubJob on the Blockchain: The Splitter divides the Job corresponding to the divided dataset into multiple SubJobs. Each SubJob is assigned a unique ID and its connection with the parent Job is maintained. The divided SubJobs are registered on the Blockchain, ready for distributed processing. The ProgramId is different between the parent Job and SubJobs. This represents the hierarchical structure of the program. (Example: ProgramId:1 = Parent Job process for overall GPT3 learning, ProgramId:11 = SubJob process of ProgramId:1)

  3. Scheduling of multiple epoch processing: Num Epoch indicates the number of times the data is learned. The Splitter schedules multiple epoch processes (Jobs) according to the Num Epoch. In each epoch, the SubJob is executed and learning progresses. By performing multiple epoch processing, the accuracy of learning improves and a higher quality model can be obtained.

Splitter skillfully divides programs and jobs and applies encryption to ensure data confidentiality while enabling efficient distributed processing. With this function, even large datasets can be processed quickly and safely.

The role of Splitter is extremely important in the architecture of EMETH. The division and distribution of data form the foundation of parallel processing and significantly improve overall performance. Also, data protection through encryption is essential from a privacy and security perspective.

Splitter works in conjunction with the Orchestrator to contribute to the overall coordination and management of distributed processing. Jobs properly divided by the Splitter are scheduled by the Orchestrator and processed efficiently.

Last updated