✍️ Column by Jesse Dinneen, Olga Batura, Caecilia Zirn, Sascha Donner, Azad Abad, Florian Loher.
Photo credits: Darya Shramko
Overview: In this column, we report on our experience as one of the teams participating in the innovative “Grand Challenge EU AI Act” event at the University of St. Gallen and provide subsequent reflections on AI applications and the perceived strengths and weaknesses of the Act.
The Competition
On July 18-19, 2023, a team of organizers from the University of St. Gallen (HSG) hosted a research competition where 11 international teams (selected from ~30 who applied) competed to examine how interpretable and actionable the amended (but still hotly debated) EU AI Act is in practice. The teams were required to develop and apply tools and methods to assess compliance of real-life AI systems with the AI Act and then were judged on a written report. Each team attended several demos of AI applications from representatives across different industries and then held interviews with the AI providers before submitting a report assessing the applications’ compliance with the EU AI Act and recommending how compliance might be increased or otherwise improved. After some months of preparation and an in-person bootcamp to trial the interview and assessment methods, two intense days of demos, interviews, report writing, and jurying took place in St. Gallen, Switzerland.
Our Team and Approach: AI JFaccs, the authors of this summary, were a team of six: Prof. Jesse Dinneen (Humboldt-Uni Berlin), Dr. Olga Batura (freelance lawyer and public policy consultant), Sascha Donner, LLM (digital transformation consultant, Ph.D. candidate, Humboldt-Uni Berlin), Florian Loher (AI development manager), and Dr. Caecilia Zirn and Dr. Azad Abad (AI experts). As the competition rulebook lays out in detail, each team’s exact approach was up to them to decide. Still, the goal of the submitted reports was to make recommendations to the AI providers for achieving or increasing the compliance of their products with the EU AI Act. Our approach, in particular, was, therefore, to gather information at the demos and solicit further information during the team-provider interviews to assess which particular set of obligations from the AI Act applied to each AI product (according to what we determined to be its risk level) and what compliance was already achieved, demonstrated, or at least known to the providers. In our report for each provider, we summarized what we found, provided brief ethical reflections if applicable, and made recommendations for increasing compliance, for example, by implementing relevant standards, obtaining certifications, preparing documentation, or modifying internal practices, perspectives, or plans for the product.
Challenging conditions of the competition: Despite the preparation time leading up to the event, all teams faced time-based challenges: it was a difficult task to extract necessary information from AI providers during the short product demos and interviews, especially as the representatives were often engineers or salespeople relatively unfamiliar with compliance. It was also rather taxing to undertake demo-watching, interview-holding, results-reviewing, and recommendation-writing for four providers in one long day (plus some early hours of the next day). We also had little to no advance information about the AI applications that could help us gauge compliance or identify issues (e.g., data descriptions, test results, relevant benchmarks), which meant teams were doing something roughly analogous to assessing a car’s safety by just hearing about the car from its engineers. We gather it was similarly challenging for the jury to evaluate 44 reports (11 teams, 4 reports each) in just a few hours on the next day, especially considering the complexity of determining legal accuracy. The competition was, therefore, as much a test of the participants’ endurance and creativity as it was of the AI Act.
Outcome: Two winning teams – LegalAIzers and Conformity Mavericks – split a prize of 100.000 CHF, which was well earned through compelling reports and impressive performances in the subsequent tie-breaker assessments. We were particularly happy that the jury appreciated the teams’ considerations of ethical principles and possible social/societal risks. Although we did not win, we were very happy to participate, learn more about emerging AI applications and potential compliance issues, and get a chance to network and meet the other international participants. The competition was an innovative approach to advancing an important, emerging topic and raising its profile; for example, the media were present and provided coverage. Other teams have shared positive experiences and reflections on the event (e.g., Emmie Hine of the LegalAIzers). We thank the organizers (especially Prof. Thomas Burri and Viktoriya Zakrevskaya), teams, providers, and jury.
Reflections on Emerging AI Applications and Regulation
With the Grand Challenge event outlined, we now share some reflections on the AI applications and on our experience of applying the EU AI Act, with the caveat that we are also dutifully respecting both the NDAs providers asked us to sign and the confidentiality section of the event’s rulebook.
Providers, applications, and technologies
The range of providers seen at the event reflects that AI is no longer a niche interest for European businesses, as small start-ups and large corporations alike are developing and implementing it. The AI technology types and application areas were diverse, including personalized medicine, delivery robots, data visualization, construction technology, cybersecurity, automotive manufacturing, and in-car entertainment. All providers participated in good faith, openly presented their products and use cases, and patiently answered questions about all aspects of their products, from technical to legal and ethical. We had the impression providers generally had a sincere interest in learning how to meet the requirements of the AI Act and avoid releasing a dangerous or harmful AI product.
There was some existing familiarity with the AI Act’s obligations and even some compliance among providers, though it sometimes depended on the size and maturity of the company. For example, some larger companies have established risk management systems, data governance (for the GDPR), and internal processes that only need to be extended to cover the AI system explicitly. We were happy to see some providers from smaller companies had also done some preparation for the AI Act and considered some potential ethical issues around their products. Regardless of size, most providers had implemented cybersecurity measures for their AI systems, and some had or were already aspiring to obtain relevant certifications. But some obligations were unfamiliar to providers, especially those related to record-keeping and technical documentation, such as the requirement to document the resources and energy consumed during the training and application of AI models.
One necessary step in assessing the obligations of an AI application was to identify if it was undoubtedly an AI system (or component) according to the AI Act’s definition. This was not always straightforward, as rather than clear AI, for example, some applications arguably used only complex algorithms or advanced statistical techniques. Similarly, but more worryingly, in some applications, it was unclear whether AI necessarily improved the product (e.g., performance, features, reduced cost, or complexity). For some AI products entering the market, traditional, safer techniques and technologies suffice (e.g., perform as well or entail fewer risks) but are simply less exciting and marketable. Thus we can imagine one consequence of the AI Act’s introduction: many AI providers today are considering whether the benefits of adding AI to their products – including benefits like marketing and funding opportunities – outweigh the costs of meeting the Act’s obligations.
Beyond AI’s role in business tactics, the eager application of AI can also be objectionable because it is laden with risks, including many that are not obvious. Indeed, almost all applications we saw were high-risk according to the AI Act, and worryingly, the providers’ awareness of possible risks seemed inverse to the extent of those risks. This phenomenon was independent of the size of the company and the representative’s role (most were engineers or salespeople). For example, when asked about a comprehensive risk management system (required by Art. 9 AI Act) for their AI-powered autonomous robot that would operate in public spaces, one representative responded that because they had implemented a computer vision technique to detect and avoid collision with humans, there were no further risks to consider. In other words, it had never occurred to them that the robot might collide with animals (i.e., kill pets not detected by human detection algorithms), lose connection or power in critical places (e.g., block emergency vehicles), or enact bias present in the detection library’s training data (its performance was assumed because it was an open-source library, and not yet formally tested by the provider).
In another case, the AI provider argued that because a system, which included a large language model (LLM), could not cause immediate physical harm, it could also not entail or enact any bias against its users. In several cases, it was assumed by providers that customers using their products would be sufficiently literate about AI to understand its limitations and risks (Art. 13 AI Act) and that the customer would ensure sufficient human oversight (Art. 14 AI Act); thus the AI provider considered the relevant obligations met or inapplicable. We find these to be very worrisome indications of AI development in Europe likely happening at top firms and garage projects alike, and they demonstrate the importance of effective regulation of and education about AI (including education of AI providers).
Applying the EU AI Act
A challenge we faced was to assess the risk category of AI applications, which is necessary for determining what obligations an AI provider would have under the AI Act: high-risk applications have considerably more duties and obligations in the design and monitoring of the product as well as in documenting and reporting their actions. The risk category depends on a general application area and intended use (and reasonably foreseeable misuse), which are often not obvious. So unless an application is clearly high-risk because it is covered by Annex II (e.g., medical devices, toys), its possible uses and consequences must be carefully envisioned and considered. While we had limited time in the competition to interpret each application and possible use cases, AI providers (and/or compliance officers) must thoroughly consider, envision, and learn about possible use (and abuse) cases to arrive at the right risk-level classification. They cannot rely on a single intended use.
We also observed in interviews and presentations that AI applications were classified as high-risk. The compliance assessment mainly focused on the additional requirements specific to high-risk AI systems (Arts. 8-15 AI Act) and, as a result, the general principles of Art. 4a AI Act, which applies to all AI systems, were neglected (i.e., only considered when the application was low risk). Of course, high-risk AI applications must be rigorously checked for compliance with the specific additional requirements. Still, it is even more important to comply with the general principles, which encourage reflection on broad risks and issues rather than tempting a provider only to amass documentation and certifications. The latter is necessary to demonstrate compliance, which is likely to prevent issues, but the former is no less important for identifying and preventing issues.
Similarly, the obligations for post-market monitoring of high-risk AI applications (Art. 61 AI Act) were mostly neglected; providers were generally unaware of the obligations, and most teams did not report on them. Such monitoring is important as it facilitates detecting malfunctions and other problems during operation once the product is launched. AI-powered products and their uses must be adapted to fit our ever-changing world, especially since they promise new levels of autonomy and performance. Importantly, a plan for post-market monitoring needs to be established already at the stage of product development to be followed upon and after product launch. Not only is this a legal requirement, but many monitoring aspects (e.g., record keeping) are highly integrated into AI systems’ design and may require great effort to implement post hoc.
Regarding our assessment of compliance of AI systems with the exact obligations of the Act, despite our team’s combined legal and technical expertise, we sometimes experienced difficulty in following, interpreting, and explaining the currently proposed legal provisions. The provisions of Arts. 8-15 (requirements for high-risk AI systems) are often fairly high level, and their wording is not always unequivocal. Thus even when we distilled the obligations and formulated our questions for AI providers, they still sometimes struggled to understand them, for example, to grasp what technical documentation is required beyond a development log and user manual, what record-keeping means exactly, and how AI users should be trained. For example, Art. 12 (2) AI Act on record-keeping states: “The logging capabilities shall ensure a level of traceability of the AI system’s functioning throughout its lifecycle that is appropriate to the intended purpose of the system.” Yet it is likely not obvious to developers which level of logging is detailed enough, what exactly needs to be logged, for how long, and where to fulfill this requirement. Thus, policymakers could develop bylaws and guidelines for applying the AI Act to facilitate compliance and its assessment. This would benefit all seeking compliance, especially SMEs and startups with limited resources, to hire or consult specialized lawyers to help them understand their obligations and compliance. Also, significant efforts are needed to educate and train AI developers (engineers, managers, etc.) to raise their awareness of the importance and application of the AI Act to the tools they create and the relevant ethical considerations.
Using standards and certifications for record-keeping, accuracy, and robustness of the AI system, technical documentation, and other issues helps providers demonstrate compliance. However, we observe today a lack of AI-specific standards and certifications, which are needed to avoid divergent interpretations of how to read and comply with the AI Act and to simplify compliance in practice for often technically-minded SMEs and startups. Currently, AI-specific standards are lacking, but other standards can be applied by analogy as mapped and recommended by the European Commission’s Joint Research Centre.
Conclusion
Unsurprisingly, the AI Act is imperfect and requires some revisions, such as its definitions (e.g., what an AI component is), conditions (e.g., who will be held responsible for integrated open source code), specificity of phrasing, and intelligibility. But such minor revisions seem tractable.
Some have expressed broader concerns that the AI Act will be too difficult to apply in practice, excluding SMEs from AI development or sinking the European economy and generally introducing bureaucracy rather than achieving its goals. Such doomsaying is reminiscent of the early days of the GDPR, where people wanting to avoid regulation made similar claims, and is perhaps the kind of worry that the Grand Challenge event was designed to test. But it was our experience (shared with ten other teams and several AI providers) that even under the challenging circumstances of a 2-day competition and despite the room for improvement identified, it is possible and feasible to assess and improve compliance with the AI Act.
Of course, the proposed EU AI Act adds hurdles to AI development, which comes at a cost. But such hurdles are important to slowing or preventing problematic applications, and the AI Act’s benefits are unlikely to be achieved without regulatory hurdles and pressure. It is also true that, even if some obligations for SMEs may be lifted by proposed amendments, such obligations are proportionally larger barriers for SMEs since they have fewer resources to commit to compliance. But considering the power and widespread adoption of AI, with which society is still reckoning, and the relative ease to implement it today in disruptive and dangerous applications, we view it as a positive for society that such hurdles might slow someone, whether a group of students or a tech giant, from launching something harmful. We hope that seeing such serious legal obstacles to moving fast and potentially breaking things encourages startups and giants alike to try harder to take ethics and fundamental rights seriously.
Unsurprisingly, most of the applications we saw at the Grand Challenge competition were not limited to the EU; the product could be launched equally effectively in another region (e.g., analytical tools for manufacturing) and adopted and used across geographic and political boundaries. A geographically inclusive regulation is, therefore, virtually inevitable and must be wide-reaching if it is to be effective. Indeed, the United Nations is now establishing a Multistakeholder Advisory Body on Artificial Intelligence, which is a further step in the right direction. Given the Act’s benefits and practicability (as we observed), we think it is a good start to regulating AI in the EU, and thanks to the Brussels Effect, the AI Act is likely to constitute a good start in other places as well.