Remote explainability faces the bouncer problem

Le Merrer, Erwan; Trédan, Gilles

doi:10.1038/s42256-020-0216-z

Article
Published: 24 August 2020

Remote explainability faces the bouncer problem

Nature Machine Intelligence volume 2, pages 529–539 (2020)Cite this article

791 Accesses
9 Citations
52 Altmetric
Metrics details

Subjects

Abstract

The concept of explainability is envisioned to satisfy society’s demands for transparency about machine learning decisions. The concept is simple: like humans, algorithms should explain the rationale behind their decisions so that their fairness can be assessed. Although this approach is promising in a local context (for example, the model creator explains it during debugging at the time of training), we argue that this reasoning cannot simply be transposed to a remote context, where a model trained by a service provider is only accessible to a user through a network and its application programming interface. This is problematic, as it constitutes precisely the target use case requiring transparency from a societal perspective. Through an analogy with a club bouncer (who may provide untruthful explanations upon customer rejection), we show that providing explanations cannot prevent a remote service from lying about the true reasons leading to its decisions. More precisely, we observe the impossibility of remote explainability for single explanations by constructing an attack on explanations that hides discriminatory features from the querying user. We provide an example implementation of this attack. We then show that the probability that an observer spots the attack, using several explanations for attempting to find incoherences, is low in practical settings. This undermines the very concept of remote explainability in general.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: The three scenarios involving remote explainability.**

**Fig. 3: Illustration of a possible implementation (Algorithm 1) of the PR attack.**

**Fig. 4: Percentage of label changes when swapping the discriminative features in the test set data for scenario B.**

**Fig. 5: Confidence level as a function of the number of tested input pairs, based on the German Credit detection probability in Fig. 4.**

**Fig. 6: Probability to find an IP, as a function of \({\mathbb{P}}(B)\), the probability of success for a non-discriminated group.**

Mitigating belief projection in explainable artificial intelligence via Bayesian teaching

Article Open access 10 May 2021

Scott Cheng-Hsin Yang, Wai Keen Vong, … Patrick Shafto

A typology for exploring the mitigation of shortcut behaviour

Article 09 March 2023

Felix Friedrich, Wolfgang Stammer, … Kristian Kersting

Understanding adversarial examples requires a theory of artefacts for deep learning

Article 23 November 2020

Cameron Buckner

Data availability

The data that support the findings in this study—as the German Credit dataset—are publicly available at https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data).

Code availability

The code used for the experiments is provided at https://github.com/erwanlemerrer/bouncer_problem (https://doi.org/10.5281/zenodo.3907271).

References

Veale, M. Logics and practices of transparency and opacity in real-world applications of public sector machine learning. In Proceedings of the 4th Workshop on Fairness, Accountability and Transparency in Machine Learning (FAT/ML, 2017); https://arxiv.org/pdf/1706.09249.pdf
de Laat, P. B. Algorithmic decision-making based on machine learning from big data: can transparency restore accountability? Philos. Technol. 31, 525–541 (2018).
Article Google Scholar
Naumov, M., et al. Deep learning recommendation model for personalization and recommendation systems. Preprint at https://arxiv.org/pdf/1906.00091.pdf (2019).
Goodman, B. & Flaxman, S. European Union regulations on algorithmic decision-making and a ‘right to explanation’. AI Magazine 38, 50–57 (2017).
Article Google Scholar
Selbst, A. D. & Powles, J. Meaningful information and the right to explanation. International Data Privacy Law 7, 233–242 (2017).
Article Google Scholar
Adadi, A. & Berrada, M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018).
Article Google Scholar
Guidotti, R. et al. A survey of methods for explaining black box models. ACM Comput. Surveys 51, 93 (2018).
Google Scholar
Molnar, C. Interpretable Machine Learning (GitHub, 2019); https://christophm.github.io/interpretable-ml-book/
Zhang, Y. & Chen, X. Explainable recommendation: a survey and new perspectives. Preprint at https://arxiv.org/pdf/1804.11192.pdf (2018).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’: explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016); https://doi.org/10.1145/2939672.2939778
Galhotra, S., Brun, Y. & Meliou, A. Fairness testing: testing software for discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering 498–510 (ESEC/FSE, 2017); https://doi.org/10.1145/3106237.3106277
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems 4768–4777 (NIPS, 2017).
Andreou, A. et al. Investigating Ad Transparency Mechanisms in Social Media: A Case Study of Facebook’s Explanations (NDSS, 2018); https://doi.org/10.14722/ndss.2018.23204
Ateniese, G. et al. Provable data possession at untrusted stores. In Proceedings of the 14th ACM Conference on Computer and Communications Security 598–609 (ACM, 2007); https://doi.org/10.1145/1315245.1315318
Pearl, J. Causal inference in statistics: an overview. Stat. Surveys 3, 96–146 (2009).
Article MathSciNet Google Scholar
Aivodji, U. et al. Fairwashing: the risk of rationalization. In Proceedings of the 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 161–170 (PMLR, 2019).
Hajian, S., Domingo-Ferrer, J. & Martínez-Ballesté, A. Rule protection for indirect discrimination prevention in data mining. In Modeling Decision for Artificial Intelligence (eds Torra, V., Narakawa, Y., Yin, J. & Long, J.) 211–222 (Springer, 2011).
Menon, A. K. & Williamson, R. C. The cost of fairness in binary classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (eds Friedler, S. A. & Wilson, C.) 107–118 (PMLR, 2018).
Tramèr, F., Zhang, F., Juels, A., Reiter, M. K. & Ristenpart, T. Stealing machine learning models via prediction APIs. In Proceedings of the 25th USENIX Conference on Security Symposium, SEC’16 601–618 (USENIX Association, 2016).
Miller, T. Explanation in artificial intelligence: insights from the social sciences. Preprint at https://arxiv.org/pdf/1706.07269.pdf (2017).
Cummins, D. D., Lubart, T. & Alksnis, O. Conditional reasoning and causation. Memory Cognition 19, 274–282 (1991).
Article Google Scholar
Alexander, L. What makes wrongful discrimination wrong? Biases, preferences, stereotypes and proxies. University of Pennsylvania Law Review 141, 149–219 (1992).
Article Google Scholar
Wu, X. et al. Top 10 algorithms in data mining. Knowledge Inform. Syst. 14, 1–37 (2008).
Article Google Scholar
Quinlan, J. R. C4.5: Programs for Machine Learning (Elsevier, 2014).
Statlog (German Credit Data) Data Set (UCI, accessed 1 September 2019); https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)
Oreski, S. & Oreski, G. Genetic algorithm-based heuristic for feature selection in credit risk assessment. Expert Syst. Appl. 41, 2052–2064 (2014).
Article Google Scholar
Brock, A., Donahue, J. & Simonyan, K. Large scale GAN training for high fidelity natural image synthesis. Preprint at https://arxiv.org/pdf/1809.11096.pdf (2019).
Khashman, A. Neural networks for credit risk evaluation: investigation of different neural models and learning schemes. Expert Syst. Appl. 37, 6233–6239 (2010).
Article Google Scholar
Hou, J. et al. Ml defense: against prediction API threats in cloud-based machine learning service. In Proceedings of the International Symposium on Quality of Service, IWQoS ’19 7:1–7:10 (ACM, 2019)
Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C. & Venkatasubramanian, S. Certifying and removing disparate impact. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 259–268 (ACM, 2015); https://doi.org/10.1145/2783258.2783311
Braun, B. et al. Verifying computations with state. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles 341–357 (ACM, 2013); https://doi.org/10.1145/2517349.25227332013
Datta, A., Sen, S. & Zick, Y. Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP) 598–617 (IEEE, 2016).
Yeh, C.-K., Kim, J., Yen, I. E.-H. & Ravikumar, P. K. Representer point selection for explaining deep neural networks. In Proceedings of Advances in Neural Information Processing Systems 31 (eds Bengio, S. et al.) 9291–9301 (Curran Associates, 2018).
Tramèr, F., Zhang, F., Juels, A., Reiter, M. K. & Ristenpart, T. Stealing machine learning models via prediction APIs. In Proceedings of the 25th USENIX Security Symposium (USENIX Security 16) 601–618 (USENIX Association, 2016).
Milli, S., Schmidt, L., Dragan, A. D. & Hardt, M. Model reconstruction from model explanations. In Proceedings of the Conference on Fairness, Accountability and Transparency, FAT* ’19 1–9 (ACM, 2019).
Binns, R. Fairness in machine learning: lessons from political philosophy. In Proceedings of the 2018 Conference on Fairness, Accountability and Transparency Vol. 81, 149–159 (PMLR, 2017).
Mitchell, M. et al. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability and Transparency, FAT* ’19 220–229 (ACM, 2019).
Blyth, C. R. On Simpson’s paradox and the sure-thing principle. J. Am. Stat. Assoc. 67, 364–366 (1972).
Article MathSciNet Google Scholar
Alipourfard, N., Fennell, P. G. & Lerman, K. Using Simpson’s paradox to discover interesting patterns in behavioral data. Preprint at https://arxiv.org/pdf/1805.03094.pdf (2018).
Zhang, L., Wu, Y. & Wu, X. Achieving non-discrimination in data release. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17 1335–1344 (ACM, 2017).
Hajian, S. & Domingo-Ferrer, J. A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans. Knowledge Data Eng. 25, 1445–1459 (2013).
Article Google Scholar
Zhang, Y. & Zhou, L. Fairness assessment for artificial intelligence in financial industry. Preprint at https://arxiv.org/pdf/1912.07211.pdf (2019).
Tan, S., Caruana, R., Hooker, G. & Lou, Y. Distill-and-compare: auditing black-box models using transparent model distillation. In Proceedings of the 2018 AAAI/ACM Conference 303–310 AIES (AAAI, 2018); https://doi.org/10.1145/3278721.3278725
Chen, L., Mislove, A. & Wilson, C. Peeking beneath the hood of Uber. In Proceedings of the 2015 Internet Measurement Conference, IMC ’15 495–508 (ACM, 2015).

Download references

Author information

Authors and Affiliations

Inria Rennes – Bretagne Atlantique Research Centre, Rennes, France
Erwan Le Merrer
French National Centre for Scientific Research, Paris, France
Gilles Trédan

Authors

Erwan Le Merrer
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Trédan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The theoretical framework was developed by E.L.M. and G.T. Experimental work was carried out by E.L.M. and data analysis by G.T.

Corresponding authors

Correspondence to Erwan Le Merrer or Gilles Trédan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le Merrer, E., Trédan, G. Remote explainability faces the bouncer problem. Nat Mach Intell 2, 529–539 (2020). https://doi.org/10.1038/s42256-020-0216-z

Download citation

Received: 29 November 2019
Accepted: 10 July 2020
Published: 24 August 2020
Issue Date: September 2020
DOI: https://doi.org/10.1038/s42256-020-0216-z

This article is cited by

Algorithmic audits of algorithms, and the law
- Erwan Le Merrer
- Ronan Pons
- Gilles Tredan
AI and Ethics (2023)
Machine learning partners in criminal networks
- Diego D. Lopes
- Bruno R. da Cunha
- Haroldo V. Ribeiro
Scientific Reports (2022)
Explainable artificial intelligence for cybersecurity: a literature survey
- Fabien Charmet
- Harry Chandra Tanuwidjaja
- Zonghua Zhang
Annals of Telecommunications (2022)