Research summary: What does it mean for ML to be trustworthy?

Summary contributed by Connor Wright, who’s a 3rd year Philosophy student at the University of Exeter.

Link to original source + author at the bottom.

Mini-summary: With the world being increasingly governed by machine learning algorithms, how can we make sure we can trust in this process? Nicolas Papernot’s 33-minute video provides his and his group’s findings on how to do just that. Ranging from topics of robustness, LP norms, differential privacy and deepfakes, the video focuses on two areas of making ML trustworthy. These are admission control at test time, and model governance. Considered at length, both are proposed as ways forward for helping make ML more trustworthy through their improvement on privacy, but they are not full-proof. A thought-provoking conclusion is then achieved to do with the alignment of human and ML norms.

Full summary:

This video provides Nicolas Papernot’s presentation of his involvement in a project on making ML more trustworthy. This included appeals to LP norms, differential privacy, admission control at test time, model governance, and deepfakes. These mentions will be dealt with as sections of the summary, ending with Papernot’s conclusions from his research.

Trustworthy ML:

In order to determine how to make ML more trustworthy, the group needed to determine what that would look like. To do this, they sought to include how a ML model could be robust against the threat of adversarial examples. Here, LP norms were utilised, placing the model as a constant predictor inside an LP ball, and thus making it less sensitive to perturbations. The LP norms then allowed for a new way of detecting adverse examples to be proposed, for they were able to be detected more clearly through their excessive exploitation of the excessive invariance that the LP norms had coded into the model. Hence, the question is asked whether such a method can be used to better detect threats through this resultant excessive exploitation, and then help train models to be robust against such threats in the future.

This is answered through framing the question in terms of an AI arms race. Traditional computer systems have treated cybersecurity in terms of ‘locking up a house’. They ‘lock the door’ in order to prevent intruders, and weigh this up against additionally ‘locking the window’ in case a bear was to break through. If this were to happen, the windows would then be locked, and weighed up against the possibility of a hawk descending through the chimney. If this were to happen, then the chimney would be locked etc. creating an AI arms race to try and defend against these threats. Instead, using LP norms to increase the robustness of the model to better detect these intruders, bears and hawks could be a way forward that saves time, money, and increases trust in ML.

Privacy:

One way to increase the trust using this model is in terms of privacy. ML lives on data, and thus it is sometimes at risk of the data subject wanting to privatise such data, even once it’s it has helped train the model. Thus, the group leant towards a definition of privacy as “differential privacy” through their Private Aggregation of Teacher Ensembles (PATE) method. Here, instead of training the model on a whole data set, the group split the data up and assigned “teachers” to each data partition. These partitions aren’t related, so each teacher is trained independently to solve the same task of the model, whereby the group can aggregate the predictions made by the model, and then add some noise to the data in order to make the predictions more private. Hence, the data is only included in one of the data sets and only influences one of the teachers, meaning that if there’s a prediction being made, your data is very unlikely to influence said prediction as you will only impact one of the predictions made. Resultantly, the data can be privatised more effectively through having less of an impact on the output, helping to align ML’s version of privacy to the human norms that fall under the same term.

The video then splits into two main topic areas: admission control at test time, and model governance, which I shall now introduce in turn.

Admission control at test time:

One way of abstaining from making a prediction within a model to help aid admission control is to promote a measure of uncertainty within the model, which the group linked back to the training data. The group then implemented the method of Deep K-Nearest Neighbours, which allowed them to open up the “black box” of the model and see what was occurring in each layer (focusing on individual problems, rather than the model itself). Here, they looked at how each layer represents their test input, and in each of the layer’s representation space, they performed a nearest neighbour search. This is done until each layer is completed, whereby the nearest neighbour search will reveal how the labels of the nearest neighbour are inconsistent, and thus point to the problem.

One question then arises as to whether there’s a possibility of defining objectives at the level of each layer to make models more amenable to test-time predictions. The group looked at this question in their use of ‘soft nearest neighbour loss’ in 2019’s ICML conference. They asked whether it was better for the model to learn representation spaces that separate the data from classes with a large margin (like a support vector machine), or whether it’s better to entangle different classes together within the layers of representations.

They found that the latter was better for the Deep K-Nearest Neighbours method to estimate the uncertainty of the model. So, they then introduced the ‘soft nearest-neighbour loss’ into the model to encourage it to entangle points from different classes in the layers of representations. The soft loss method will then encourage the model to co-opt features between different classes and the lower layers (which are able to be recognised using the same lower level features). This then helped them identify uncertainty when they have a test point that doesn’t fall in any of the clusters, meaning they would’ve had to have guessed whom their nearest neighbours are. Instead, there’s now support when doing these searches as the different classes are subsequently entangled.

Model governance:

The group explored the problem of how to return data to a data subject that no longer wants their data to be utilised, despite it already being used to train the model, requiring a form of “machine-unlearning”. The question then becomes: is the differential privacy proposed enough to prevent such time-consuming machine-unlearning from occurring? As in, can the data being subjected to only affecting the outcome of one teacher be enough to satisfy what machine-unlearning would achieve?

The group decided that this probably wouldn’t be the case. The model’s data points (like in stochastic gradient descent) would still be influenced by the initial data point of the data subject, thus making it hard to remove their data entirely. Hence, Shared Isolated Sliced Aggregate training is then explored. Here, the method involves splitting the model into shards, and those shards into slices, meaning that the data point will only be in one shard and in one slice of that shard, whereby only one shard will need to be retrained rather than the whole model. Such retraining will then be quicker, for each shard completion contains a checkpoint before moving onto the next shard, proving a launching pad for retraining the model.

Deepfakes:

The notion of ML’s role in deepfakes is then considered, with progress in ML accelerating the progress in digital alteration. The group considered 3 approaches as how to combat this:

Detect artifacts within the altered image (such as detecting imperfections, like imperfect body movements).
Reveal content provenance (secure record of all entities and systems that manipulate a particular piece of content).
Advocate a notion of total accountability (record every minute of your life).

The group believed that none of these 3 methods would achieve total coverage of all the problematic areas of deepfakes, so the supplementation of policy on areas such as predictive policing and feedback loops is required.

Conclusion:

The group concluded that research is needed in order to align ML with human norms. Once this is done, trustworthy ML is an opportunity to make ML better, and a cause that provides much food for thought for the future.

Original presentation by Nicolas Papernot et al.: https://youtu.be/UpGgIqLhaqo

Research summary: What does it mean for ML to be trustworthy?

Full summary:

Project Let’s Talk Privacy (Research Summary)

Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarl...

Machines as teammates: A research agenda on AI in team collaboration

Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Se...

Designing a Future Worth Wanting: Applying Virtue Ethics to Human–Computer Interaction

Extensible Consent Management Architectures for Data Trusts

Clueless AI: Should AI Models Report to Us When They Are Clueless?

Unprofessional Peer Reviews Disproportionately Harm Underrepresented Groups in STEM (Research Summar...

Mapping value sensitive design onto AI for social good principles

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Stude...

Categories

Signature Content

Learn More

The AI Ethics Brief (bi-weekly newsletter)

About Us

Archive

Full summary:

Project Let’s Talk Privacy (Research Summary)

Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarl...

Machines as teammates: A research agenda on AI in team collaboration

Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Se...

Designing a Future Worth Wanting: Applying Virtue Ethics to Human–Computer Interaction

Extensible Consent Management Architectures for Data Trusts

Clueless AI: Should AI Models Report to Us When They Are Clueless?

Unprofessional Peer Reviews Disproportionately Harm Underrepresented Groups in STEM (Research Summar...

Mapping value sensitive design onto AI for social good principles

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Stude...

Footer

Categories

Signature Content

Learn More

The AI Ethics Brief (bi-weekly newsletter)

About Us

Archive