# High-performance modmesh
###### tags: `modmesh`
We want to make modmesh to run very fast. Steps to be take:
1. Add in-house runtime profiling code. Starting with scope-based profiler.
2. Design and implement cache-friendly constructs.
* SimpleArray is already cache-friendly. We need to profile to find runtime hotspot to make sure cache is working.
3. On top of the cache-friendly code, add SIMD (data parallelism).
* x86, Neon, Apple Silicon.
4. On top of the SIMD-enabled code, add stream-processing (GPU) code (only for data parallelism).
* Need controls for Apple Silicon, Intel, Nvidia, and AMD.
*