🧵

Cuda

Cuda

Optimizing for Concurency has always been painful for a ton of reasons. SIMD is one such thing that helps in solving this problem. Cuda kinda follows SIMT(Singl Instruction Multiple threads). This is just a way to segment compute by the number of available threads and then vastly increasing the number of threads. instead of making it dependent on the data, which could be dynamic and constrain hardware optimization.

This has worked incredibly well for at least one company who seems to be seeing no end to the growing.

atomicAdd(valueaddress,increment)

this is a function that syncs a value across all threads and increments it by a value. quite helpful when trying to sync a value across different threads. but when the process is very shallow (eg. add values in a list) it’s not very useful becuase instead of doing any useful work, all the threads are simply waiting on each other.

Posts

, Nov 12,'25   

✚      Color inversion , Nov 12,'25   

🔄      Count 2d Array Element , Nov 12,'25   

🔄      Count Array Element , Nov 12,'25   

◀️      Leaky ReLu , Nov 12,'25   

◀️      Matrix Addition , Nov 12,'25   

©️      matrix copy , Nov 12,'25   

X      Matrix Multiply , Nov 12,'25   

T      Matrix Transpose , Nov 12,'25   

🌈      Rainbow Table , Nov 12,'25   

◀️      ReLu , Nov 12,'25   

◀️      Reverse Array , Nov 12,'25   

🔄      SiLu , Nov 12,'25   

©️      simple inference , Nov 12,'25   

🔄      SwiLu , Nov 12,'25   

✚      Vector Addition , Nov 12,'25