[QUESTION] LLaVA model_type, pipeline parallel training #1078
Replies: 2 comments
-
|
Hi @KookHoiKim did you resolve this? |
Beta Was this translation helpful? Give feedback.
-
|
Hey, I guess the PP=2 failure is an architectural constraint in Megatron-LM's multimodal pipeline implementation. LLaVA uses The correct configuration is Please correct me otherwise. Thank you. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to apply pipeline parallelism in training LLaVA. (TP1, PP2)
Although I followed the instruction , the code is not working. (TP2 PP1 working)
And i found there is some weird points in the code.
In my understanding, vision encoder / vision projector is additional embedding part, which is only used in pre_process part.
However, LLaVA model is initialized as
encoder_and_decodermodel_type. Why notencoder_or_decoder?Furthermore, while pp communication, the recv, send tensor shape is set as
(num_image_token, B, hidden_size).It seems shards gives/takes vision embedding, not the intermediate states from middle of language model.
P.S. Currently, i do not use encoder_pipeline_model_parallel_size / tensor parallel size because it occurs errors while initializing megatron that is not divisible
world_size % total_model_size.So i forced
vision_config.pipeline_model_parallel_sizeto be 1.I am not familiar with megatron code, and really hope that get some help with llava training.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions