AI governance and threat modelling
AI/ML applications bring with them a new class of problems and solutions. These are my notes learning about this
How are ML systems measured.
While training this is the loss and accuracy of the prediction.
For LLMs there are benchmarks like,
ARC (A12 Reasoning Challenge)
- Knowledge Reasoning framework, instead of a Q&A like SQuAD (Stanford Question Answering Dataset) and (SNLI) Stanford Natural Language Inference.
- Distributed evidence in sentences, meaning that the answer is evenly distributed throughout the question.
API-bank, LLM tool usage, evaluation .
hellaSwag, to evaluate common sense of LLMs by having “adversarial endings”
GLUE (General Language Understanding) and SuperGLUE, MMLU,
last update: 2024-09-06