🤖

AI governance and threat modelling

AI/ML applications bring with them a new class of problems and solutions. These are my notes learning about this

How are ML systems measured.

While training this is the loss and accuracy of the prediction.

For LLMs there are benchmarks like,

  • ARC (A12 Reasoning Challenge) benchmark.

    • Knowledge Reasoning framework, instead of a Q&A like SQuAD (Stanford Question Answering Dataset) and (SNLI) Stanford Natural Language Inference.
    • Distributed evidence in sentences, meaning that the answer is evenly distributed throughout the question.
  • API-bank, LLM tool usage, evaluation API-bank implementation.

  • hellaSwag, to evaluate common sense of LLMs by having “adversarial endings”

  • GLUE (General Language Understanding) and SuperGLUE, MMLU,