Skip to content
This repository was archived by the owner on Nov 15, 2024. It is now read-only.
This repository was archived by the owner on Nov 15, 2024. It is now read-only.

outbuf.write is too slow #17

@izhukov1992

Description

@izhukov1992

Hi!

Thanks a lot for this exciting Cython extension for avro serialization! It makes my code approximately x2 faster.

But the next iteration of profiling shows the bottleneck in io.py (BytesIO.write) that is used in process of serialization. Probably, I use it in wrong way (please, correct me if so, maybe need to use something else BytestIO):

buff = BytestIO()
writer = FastDatumWriter(schema_object)
encder = FastBinaryEncoder(buff)

for dict_record in dict_records:
    buff.seek(0)
    writer.write(dict_record, encoder)

If this is correct use case, maybe you could suggest how to use native Cython data structures in fast_binary.pyx (because I'm just newbie in Cython for a while). Then I could create my own fork and try to implement it to avoid using of BytesIO.

Just my own point of view on this "problem" is that the .write() method is invoked for each field of schema. If schema is quiet complex it leads to multiple invoking of .write() method (in my profiling report it takes 50% of execution time). Probably, it's possible to fill some internal Cython data structure (maybe char*) and convert it once to BytesIO in the end.

I will be happy to see any answer from you!
Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions