Skip to content

Conversation

@DanielOaks
Copy link
Contributor

This fixes an issue where the regexes near the start of getXMLPage kill the process if it runs out of memory.

As described in the comment, we do the .split before we do the regexes because if regexes run out of memory the whole process tanks, whereas if .split runs out of memory it just throws a MemoryError we can catch and deal with.

This lets us download larger pages without dumpgenerator.py just dying unexpectedly.

note: I think this may also be affecting us in another spot as well. I'll check and see whether this sort of fix would fix this other issue I'm running into as well.

edit: Now that I'm back home, been looking and it may actually have something to do with the Linux OOM killer. Time to do more research and maybe hopefully find out how to get it to except rather than kill us!

@nemobis
Copy link
Member

nemobis commented Oct 25, 2015

The change is sane, but we might want to make that replacement faster. In particular, we should iterate over lines and replace them one by one, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants