⚙️

LLM Code generation notes

I explore LLMs and generating code.

This is going to be binary analysis with LLM type thing The idea is to see, if LLMs are able to understand the “language” of binaries and are able to retrieve the underlying vulnerability.

High Level Topics

  • Decompilation
  • Code creation for tasks.

chatGPT does better in Julia than python why?

  • Julia code and stdlib is more consistent than in python
  • gpt trips up in the same way new learners of python trip up
  • There’s too much data on python which means people of different experience levels write python code meaning the training data is inconsistent. But, Julia is usually written by highly specialized people hence the code generation is on a specific level.

Codegen tools

Security in generated code

Decompilation projects

Datasets

Runnning AI generated code

Other research papers

Tasks