I have very simple task. After migration and adding new field (repeated and composite property) to existing NDB Entity (~100K entities) I need to setup default value for it.
I tried that code first:
q = dm.E.query(ancestor=dm.E.root_key) for user in q.iter(batch_size=500): user.field1 = [dm.E2()] user.put()
But it fails with such errors:
2015-04-25 20:41:44.792 /**** 500 599830ms 0kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=1-17-0 W 2015-04-25 20:32:46.675 suspended generator run_to_queue(query.py:938) raised Timeout(The datastore operation timed out, or the data was temporarily unavailable.) W 2015-04-25 20:32:46.676 suspended generator helper(context.py:876) raised Timeout(The datastore operation timed out, or the data was temporarily unavailable.) E 2015-04-25 20:41:44.475 Traceback (most recent call last): File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 267, in
The task runs on separate task queue so it have at least 10 mins to execute but seems it is not enough for it. Strange other thing: warnings from NDB. May be there is dead lock because of updates for same Entities from other instances (initiated by users) but not sure.
Anyway I want to know best practices (and simplest) for such task. I know about MapReduce but currently it looks for me overcomplicated for such task.
Also I tried to use
put_multi by grabbing all entities in an array but GAE stops the instance as soon it exceeds ~600 MB of memory (with 500 MB limit). Seems it is not enough memory to store all entities (~100K).