UC: Manipulative Benchmark Optimization #64

jawache · 2024-06-05T16:15:21Z

Optimize the software so that it detects it's running in the context of a benchmark, adjust it's method of functioning so it performs optimally in those conditions but performs worse in real-life conditions.

How this might be reflected in an LLM environment would be LLMs optimized to return lower quality results (but therefore faster, more energy efficient) when it detects it's being run with a benchmark of prompts, however prompts outside those known to be used in a benchmark return higher quality (more energy intensive) results.

Counter

Usage based measurements rather than benchmark based measurements. For example rather than measure based of a benchmark of prompts, measure the emissions of all the entire infrastructure of an LLM Software (SaaS/Application) every day and then divide by the total number of prompts user made during that day. By using actual user data we get a more realistic figure and it more directly useful to the end user and also it's not something that is easy to manipulate.

jawache mentioned this issue Jun 5, 2024

Clarifying the intended and unintended consequences of SCER #61

Open

8 tasks

seanmcilroy29 added documentation Improvements or additions to documentation Action Item labels Jun 6, 2024

seanmcilroy29 assigned jawache, seanmcilroy29, chrisxie-fw and sk16-dev Jun 6, 2024

seanmcilroy29 mentioned this issue Jun 7, 2024

2024.06.26 #70

Closed

24 tasks

seanmcilroy29 mentioned this issue Jul 3, 2024

2024.07.24 #76

Closed

18 tasks

seanmcilroy29 mentioned this issue Aug 7, 2024

2024.08.07 #83

Closed

18 tasks

seanmcilroy29 mentioned this issue Sep 4, 2024

024.09.04 #85

Closed

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UC: Manipulative Benchmark Optimization #64

UC: Manipulative Benchmark Optimization #64

jawache commented Jun 5, 2024 •

edited

Loading

UC: Manipulative Benchmark Optimization #64

UC: Manipulative Benchmark Optimization #64

Comments

jawache commented Jun 5, 2024 • edited Loading

Counter

jawache commented Jun 5, 2024 •

edited

Loading