-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Hi there,
Thank you for your work! It's lot's of help.
But I think this code has some discrepancy with the original paper and original theano implementation and may lead to error. In original paper and code, in Sample Level prediction, sample input is partitioned into overlapping frames with length frame_size. For example, if the seq_input is (batch, seq_len), sample level input would consist of seq_input[:, 0:frame_size], seq_input[:, 1:frame_size+1], seq_input[:, 2:frame_size+2]... As a result sample level input would have shape [total_number_of_overlapping_frames(batch*seq_len), frame_size]. In the original theano implemention, function images2neibs did the work, you can find it here: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/2a3dbdf9eb00f03e64adf58e6780e2a48b9ff6dc/models/two_tier/two_tier.py#L394
I am confused whether this has been implemented in the sample_level_prediction function? I found this issue because I cannot generate useful audio when frame_size is other than 2.
Also please dont hesitate to correct me if I am wrong somewhere.
Best regards,
Nic