[QUESTION] LLaVA model_type, pipeline parallel training

I'm trying to apply pipeline parallelism in training LLaVA. (TP1, PP2)
Although I followed the instruction , the code is not working. (TP2 PP1 working)

And i found there is some weird points in the code. 

* LLaVA is basically decoder-only model. 

In my understanding, vision encoder / vision projector is additional embedding part, which is only used in pre_process part. 
However, LLaVA model is initialized as `encoder_and_decoder` model_type. Why not `encoder_or_decoder`?

Furthermore, while pp communication, the recv, send tensor shape is set as `(num_image_token, B, hidden_size)`. 
It seems shards gives/takes vision embedding, not the intermediate states from middle of language model. 

P.S. Currently, i do not use encoder_pipeline_model_parallel_size / tensor parallel size because it occurs errors while initializing megatron that is not divisible `world_size % total_model_size` . 
So i forced `vision_config.pipeline_model_parallel_size` to be **1**.



I am not familiar with megatron code, and really hope that get some help with llava training. 

Thank you. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] LLaVA model_type, pipeline parallel training #1043

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QUESTION] LLaVA model_type, pipeline parallel training #1043

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions