Cyber & IT Supervisory Forum - Additional Resources

Measure 2.3 AI system performance or assurance criteria are measured qualitatively or quantitatively and demonstrated for conditions similar to deployment setting(s). Measures are documented. About The current risk and impact environment suggests AI system performance estimates are insufficient and require a deeper understanding of deployment context of use. Computationally focused performance testing and evaluation schemes are restricted to test data sets and in silico techniques. These approaches do not directly evaluate risks and impacts in real world environments and can only predict what might create impact based on an approximation of expected AI use. To properly manage risks, more direct information is necessary to understand how and under what conditions deployed AI creates impacts, who is most likely to be impacted, and what that experience is like. Suggested Actions Conduct regular and sustained engagement with potentially impacted communities. Maintain a demographically diverse and multidisciplinary and collaborative internal team. Regularly test and evaluate systems in non-optimized conditions, and in collaboration with AI actors in user interaction and user experience (UI/UX) roles. Collaborate with socio-technical, human factors, and UI/UX experts to identify notable characteristics in context of use that can be translated into system testing scenarios. Measure AI systems prior to deployment in conditions similar to expected scenarios. Measure and document performance criteria such as validity (false positive rate, false negative rate, etc.) and efficiency (training times, prediction latency, etc.) related to ground truth within the deployment context of use. 119 Evaluate feedback from stakeholder engagement activities, in collaboration with human factors and socio-technical experts.

Made with FlippingBook Annual report maker