Fix regex OOM issue on large pages #264

DanielOaks · 2015-10-22T23:37:48Z

This fixes an issue where the regexes near the start of getXMLPage kill the process if it runs out of memory.

As described in the comment, we do the .split before we do the regexes because if regexes run out of memory the whole process tanks, whereas if .split runs out of memory it just throws a MemoryError we can catch and deal with.

This lets us download larger pages without dumpgenerator.py just dying unexpectedly.

note: I think this may also be affecting us in another spot as well. I'll check and see whether this sort of fix would fix this other issue I'm running into as well.

edit: Now that I'm back home, been looking and it may actually have something to do with the Linux OOM killer. Time to do more research and maybe hopefully find out how to get it to except rather than kill us!

nemobis · 2015-10-25T09:20:33Z

The change is sane, but we might want to make that replacement faster. In particular, we should iterate over lines and replace them one by one, I think.

Fix regex OOM issue on large pages

61c9bc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix regex OOM issue on large pages #264

Fix regex OOM issue on large pages #264

Uh oh!

DanielOaks commented Oct 22, 2015

Uh oh!

nemobis commented Oct 25, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix regex OOM issue on large pages #264

Are you sure you want to change the base?

Fix regex OOM issue on large pages #264

Uh oh!

Conversation

DanielOaks commented Oct 22, 2015

Uh oh!

nemobis commented Oct 25, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants