How does torch.compile() achieve massive speedups despite highly optimized NumPy functions? [D] — PLINKFEED