Example to run Megatron #196

Juanhui28 · 2024-02-26T06:11:18Z

Hi,

Thanks for the exciting work!! I want to use the parallel methods when running Megatron, but seems there isn't an example/script to run Megatron and I cannot find a main function. Could you please share the example to run Megatron based on different parallel methods (e.g., data and model paralle)? Thanks!

laekov · 2024-02-26T06:26:00Z

To run FastMoE with Megatron, you are supposed to use Megatron's main function, e.g. pretrain_gpt.py, with FastMoE's patch applied.

Juanhui28 · 2024-02-26T07:44:59Z

Thanks for your response!

Which patch I should use if I want to enable the expert parallel? Thanks!

laekov · 2024-02-26T07:48:04Z

You should use the patch that matches your Megatron version. The key operation to enable moe is adding --fmoefy argument to the pretrain_xxx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example to run Megatron #196

Example to run Megatron #196

Juanhui28 commented Feb 26, 2024

laekov commented Feb 26, 2024

Juanhui28 commented Feb 26, 2024

laekov commented Feb 26, 2024

Example to run Megatron #196

Example to run Megatron #196

Comments

Juanhui28 commented Feb 26, 2024

laekov commented Feb 26, 2024

Juanhui28 commented Feb 26, 2024

laekov commented Feb 26, 2024