-
Notifications
You must be signed in to change notification settings - Fork 64
Open
Description
I am using this model to do inferencing on 1 million data point using A100 GPU's with 4 GPU. I am launching a inference.py code using Googles vertex-ai Container.
How can I make inference code to utilise all 4 GPU's ? So that inferencing is super-fast.
Here is the same code I use in inference.py:
from styleformer import Styleformer
import warnings
warnings.filterwarnings("ignore")
# style = [0=Casual to Formal, 1=Formal to Casual, 2=Active to Passive, 3=Passive to Active etc..]
sf = Styleformer(style = 1)
import torch
def set_seed(seed):
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
set_seed(1212)
source_sentences = [
"I would love to meet attractive men in town",
"Please leave the room now",
"It is a delicious icecream",
"I am not paying this kind of money for that nonsense",
"He is on cocaine and he cannot be trusted with this",
"He is a very nice man and has a charming personality",
"Let us go out for dinner",
"We went to Barcelona for the weekend. We have a lot of things to tell you.",
]
for source_sentence in source_sentences:
# inference_on = [0=Regular model On CPU, 1= Regular model On GPU, 2=Quantized model On CPU]
target_sentence = sf.transfer(source_sentence, inference_on=1, quality_filter=0.95, max_candidates=5)
print("[Formal] ", source_sentence)
if target_sentence is not None:
print("[Casual] ",target_sentence)
else:
print("No good quality transfers available !")
print("-" *100)
mhillebrand
Metadata
Metadata
Assignees
Labels
No labels