Skip to content
This repository was archived by the owner on Jan 3, 2023. It is now read-only.
This repository was archived by the owner on Jan 3, 2023. It is now read-only.

slow, single threaded #459

@bbudescu

Description

@bbudescu

Hi, I've been trying to use ngraph to accelerate my tensorflow detector/testing pipeline, but, unfortunately, without any success so far. The inference process either has the same performance, or becomes painstakingly slow.

I'm not quite sure whether I'm installing and using ngraph right.

I'm not quite sure whether this is the right place to ask these questions, since it might just be something obvious that I've missed, thus not being an actual issue, but I couldn't find any other support channel. If there is a different, more appropriate one, please direct me to it.

For installation, I've used pip inside my own dockerfile to install ngraph-tensorflow-bridge, following the instructions on this repo (and also installed plaidml, since I've noticed ngraph looks for it during initialization; I didn't build the ngraph library myself, since I noticed that the bridge supplies the .so, and it doesn't complain when loading it).

Also, I've tried turning on xla, but it has no effect

config = tf.ConfigProto()
config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
tf.keras.backend.set_session(tf.Session(config=config))

Also, I've tested ngraph with intel-tensorflow. When ngraph is off, intel-tf gets about twice as fast as vanilla tf. When ngraph bridge is imported, the performance is really, really low (i.e., I've got bored of waiting for an operation to finish that takes a few seconds when ngraph is not used).

Also, I've tried both 'NCHW' and 'NHWC' under both the vanilla and the intel distributions of tensorflow.

For usage, I only added import ngraph_bridge after importing tensorflow. Is there something else I'm supposed to do?

I didn't get any stdout/stderr message to help me figure out whether ngraph is actually on or not. I've looked through the output of tensorflow.python.client.device_lib.list_local_devices(), but nothing seems to change when adding the import. The only indication that ngraph is used is when I don't disable my GPU (os.environ["CUDA_VISIBLE_DEVICES"] = ""), and I get an error message.

Here is the code I've used for testing out ngraph (it's based on the keras example in this repo). I think the longest I've waited to see some training progress was 10 minutes (without ngraph, I get a progress bar update after under half a minute).

import numpy as np
import os

os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["KMP_BLOCKTIME"] = "0"
os.environ["OMP_NUM_THREADS"] = "4"
os.environ["KMP_AFFINITY"] = "granularity=fine,verbose,compact,1,0"
os.environ['KERAS_BACKEND'] = 'tensorflow'

import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
import ngraph_bridge

# A simple script to run inference and training on resnet 50

config = tf.ConfigProto()
config.intra_op_parallelism_threads = 4
config.inter_op_parallelism_threads = 4

# config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1

tf.keras.backend.set_session(tf.Session(config=config))
# tf.keras.backend.set_image_data_format('channels_first')

model = ResNet50(weights=None)

batch_size = 128
img = np.random.rand(batch_size, 224, 224, 3)
# img = np.random.rand(batch_size, 3, 224, 224)

preds = model.predict(preprocess_input(img))
print('Predicted:', decode_predictions(preds, top=3)[0])
model.compile(tf.keras.optimizers.SGD(), loss='categorical_crossentropy')
preds = model.fit(
    preprocess_input(img), np.zeros((batch_size, 1000), dtype='float32'))
print('Ran a train round')

I've also tried ngraph for a different code that doesn't use keras (it uses tensorflow's object_detection API instead). Speed is at least 20% lower when using ngraph. For some models the process would have a much larger memory footprint when using ngraph vs. when not (I noticed that because my laptop started paging and crashed).

I've also noticed that while training the keras examples, only one of the 8 logical cores of my cpu is used. This happens when running inference on the detection model, but a fraction of the time more than one core is saturated.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions