Speedup / C redesign

--- tags: Project Notes --- # Speedup / C redesign from olexa: Today, my involvement in this project nominally comes to an end; because the IDT/ART team is supposed to retire to its regular duties. I'm sorry I wasn't able to give you the 1-2 orders of magnitude speedup I'd have hoped for; A 2x speedup was what I could accomplish by bypassing some hot paths in Human and Environment's timestamp code and minimizing the knock-on changes elsewhere in the codebase. I hear rumours, however, that you, Martin, are taking over responsibility for the repo and have plans for a location/zone-based parallel approach. If so, ping me when you're architecting it. I have a rough design in my mind of a BaseEnvironment object written partially in C containing up to a few million BaseHumans written partially in C. They would then be subclassed by Python Environment and Human objects and mostly written in Python. On a large machine (ideally fat 64-core/128-thread single or dual AMD CPUs; Such things can be procured), the code would then os.fork() itself to 1 process per thread, and manipulate Human objects allocated out of a massive, flat, shared-memory array. "Sending" a human to another thread would involve "simply" injecting a pointer to the Human into another process's priority queue using the shared-memory array (this is a job for C code). If all Human attributes can be reduced to non-Python types such as bits, floats, 64-bit ints or limited-size arrays of the above, no pickling/unpickling would be required for intra-process Human sharing. It's still okay if not all Human attributes can be so easily reduced, but then there are extra complexities. At several million Humans, and especially with the slowness of Python code, the contention between the processes for shared memory accesses will be low enough that you should see approximately linear speedup (~64x on a 64-core machine). That should pull you into reach of 1M-person simulations.