Five Pillars of building responsible AI systems for consumer products
Artificial intelligence technology has inherently become a part of our lives right from smart lights to virtual assistants. The impact of AI is comparable to the advent of the internet. In exchange for convenience, ubiquitous information, and omni-connectedness, we must contend with new dangers like online echo chambers, hacking, and DoS attacks [1]. AI systems bring us similar tradeoffs: on one hand bringing better prediction, better automation and better personalization[2][3], but at the cost of reduced privacy, fragility, and algorithmically perpetuating bias[4][5][6] . These drawbacks should not halt innovation. Rather, we need an ethical framework for building responsible AI. We break this framework into five pillars
I. Explainability and Transparency
AI needs to explain its reasoning and be transparent about what it’s doing. Explainability comes in three forms: data explainability, model explainability and decision explainability. Data explainability means the data being used to build the models are tested for quality and appropriate for the context of the model. What does the data features mean? Does the feature values represent the population distribution? Model explainability helps us understand how models work. Which features are contributing with what factor in the output of the model? How do the predictions vary with variation in the input data using sensitivity analysis? Once the model is predicting results, we need to be able to interpret the result produced by the model, also known as decision explainability. Which features are contributing to a certain decision? Will the decision significantly change with a minor change in a feature value?
Transparency is a complementary attribute to explainability and can be broken into data lineage and model lineage. Data lineage answers questions around data provenance. Where did the data come from, i.e. the source. Is this a reliable source? What cleaning steps and transformations were performed on the data? What features from the data are being pulled in? Model lineage revolves around managing model training and metadata to provide transparency. Why is the particular model being used for the problem? What metric is being used to assess performance? What is feature engineering done on selected features? What are the optimized hyperparameters? What are the different versions of the model? How often is the model updated?
As AI projects have grown in complexity, the problems around explainability and transparency have only grown. Gone are the days of black-box models. We now must be able to tell other consumers and regulators why a model was chosen and why it produced the values it did.
Personas involved with the explainability and transparency aspect are data scientists, risk and governance analysts, business stakeholders and users who are cross-functionally responsible for and accountable to in regards to the model decisions and data.
II. Accountability and Governance
With models being built using customer data, regulations around how the data is being used, which models are being built, what are the outputs produced and how are the outputs used, needs to be answered. This is where accountability comes into play. In the past we have seen companies having GRC platforms(governance, risk management and compliance), but now a component of AI governance has come into play, due to the complexity of machine learning models. AI Governance would need model and application auditing. This would include documenting model facts, high-level data sources, information about the audience of the application and which business use-case is the model being used for in the organization. With accountability, we would introduce new personas who would be responsible for different aspects of the AI applications and how it would interact/ affect the users. Accountability can also be seen as how AI governance is structured, regulated and maintained. For a better governance practice, there has to be certain standardized processes that run across different industries. In early 2021, cloud providers (including Google, IBM, Microsoft, Amazon, SAP) came together to set standardized practices for trusted cloud.
Accountability and Governance is to be the prime responsibility of the Chief Data Officer. Concerns around how technology impacts the society and people need to be researchers from sociology or psychology specializations. Aspects of law and policy need to be addressed by subject matter experts who would be working with the customer product team to understand the logistics of the application and its users. Governance and policies can be different in different geographical locations based on governmental laws, which should be strictly followed. Any kind of data curation should be done while keeping in mind the localized policies.
To understand how accountability and governance for an AI solution would look like in practice, let’s walk through an example. Let’s say an organization has global business and has users all across the globe. Different countries could have varied measures of laws and policies for governance. The organization needs to adhere to these depending on the geography of the users. Measures that fall under GDPR must be strictly followed and the organization needs to have guardrails setup in their AI pipeline which ensures seamless operations. The organization needs to make sure that if they are building any global models that uses data of the customers across the globe, it is done so while complying to the data protection policies. Data lineage and model lineage can be documented so that the governance officials can ensure accountability of the models built by the organization.
Personas involved with the accountability and governance aspect are auditing officers, law professionals and senior executives who are responsible for the overall reputation and ethical functioning of the organization.
III. Privacy
As a part of the privacy pillar, organizations need to adhere to protect personal identifiable and sensitive information. This would start with collecting data from trusted sources and applying techniques like differential privacy, or at the very least, de-identification.. There are possibilities of data leak using proxy features that could reveal sensitive information, like height, weight to reveal gender or zip code to reveal the race of the person.
Organizations need to articulate clear privacy policies, communicate it to users, and abide by it. As an example, we can look at an organization that collects user data from mobile phones, like audio, camera, keyboard searches etc. The application needs to prompt the users about the data that is being collected from the user and give users an option to switch it off, if they don’t wish to share that information. If the organization starts collecting data without the knowledge of the entity, then they are violating data privacy rules. Similarly, it is the responsibility of the organization to not just store the personally identifiable information securely, but also to make sure that such information isn’t being passed through the analytical platform and being used for any kind of modeling/ analytics purposes.
Personas responsible for ensuring privacy are data privacy and security officers under the supervision of the Chief Data Officer.
IV. Robustness
While AI-powered systems can augment human decision-making and improve outcomes, they are not infallible and may be vulnerable to adversarial attacks. This raises security concerns, potentially compromising people’s confidence in the systems. While the technical community exposes and fixes vulnerabilities in software systems on an ongoing basis, attacks on AI-powered systems pose new challenges. The attacks can be classified in four categories: poisoning, inference, evasion, and extraction.
In a poisoning attack, the attacker is able to hamper the way the mode predicts output. The attacker is able to create a backdoor method to construct any output without providing the right set of inputs; for example an adversarial attack, where you add minor noise to the image data which is not detectable by human eyes. But when the model is scored, it predicts something completely different. Inference attack or membership inference attack (MIA) is where the attacker is able to identify the data being used to train the model. This creates a major privacy concerns. The attacker can run several inputs through the scoring pipeline and tries to rebuild the training samples. Evasion attack is a kind of adversarial attack where the inputs to a model are carefully perturbed to produce a completely unexpected result. Extraction attack is an attack of compromised confidentiality. In this attack, the attacker is able to access the model, say using an API, which is proprietary for an organization. The attacker is able to send inputs to the model to extract output, for example, let’s say an organization has a credit risk model deployed and the attacker is able to access the model scoring API to get results.
To make sure that the machine learning pipeline is robust, cybersecurity experts and data security experts need to build guardrails. Robustness is a cybersecurity concern and would be handled by cybersecurity professionals. Their role would be to ensure that the data is protected with IAM tokens and the access rights are preserved. They need to ensure that the model API isn't exposed to attacks. To ensure that the models are protected from adversarial attacks, data scientists need to train the model with noisy data.
V. Fairness and De-biasing
AI-powered applications are used in multiple facets to cater to users for assistance, automation and augmentation. In many cases, AI is making high-stakes decisions like credit limit approvals, hiring decisions, or facial recognition. In these situations, decision making needs to be particularly sensitive to bias. The bias could appear in the models due to unrepresentative training data or inherently biased societal norms. In the process of de-biasing the models or introducing fairness in the models is a question of values, not just scientific accuracy. The factors considered are subject to interpretation.; Companies need to establish a standard for what constitutes fairness. With this critical definition established, there are many algorithms, including open-source and proprietary ones [7], that can help companies de-bias their models to make them more fair and inclusive.
Let’s look at a real-world example of Amazon’s Hiring AI solution. Amazon worked on a model to automate resume screening using AI. Soon after it was put into use, the company realized that there was a major bias in the system. After inspection, it was found that the data being used to train the model was the resumes submitted to Amazon in the past 10 years. Statistically, the tech industry is dominated by males with over 60%-80% representation globally. It was found that the model learnt that men are preferred over women and penalized the resumes which have keywords related to “women”. Even after trying to modify the model, it couldn’t be guaranteed that the model would not learn a new pattern to be biased, hence the project was shelved.
There are several metrics that can be used to understand if a model is biased with respect to a particular feature. If so, we need to employ measures to de-bias the model. Fairness and de-biasing is a technical aspect of model building and hence needs to be addressed by the data scientist, as a required persona. For model fairness, ethics researchers need to ensure that the model is not being used for making decisions that are not ethical.
Conclusion
Through this article, any customer product team would be able to build guardrails, lay policies and build data science pipelines in a trustworthy manner. If you ask why this is so important now more than ever, here is the reason. With technological advancements, applications built in companies are more transparent than ever. Companies are accountable for what they build and what they do at various levels to various parties. Not having a responsible framework for building AI applications can hamper the brand reputation of the company. Secondly, as companies are having increased regulations, their actions need to be accountable. Thirdly, with the increasing complexity of the models, it is important to have checkpoints to ensure that these applications don’t have a drastic negative impact on society. Lastly, humans come before technology, hence social justice is an important factor.
References
Co-Founder & CTO at Enterprise Quantum.AI , Co-Founder at AccurKardia , Founder at Cogxar Research And Solutions
2yExplainable AI or XAI is on the agenda of most of the products to stand exclusive but the fact is it's just a conscious attempt to explain. This is beyond the thinking of conscious mind. I like the way it's attempted to explain.
Author of 'AI as ART', 'AI.inc' and 'Knowingly', founder of learn108.com
2yHow is Explainability possible within Google's search algorithms? AI is the dance that takes place between data, algorithms, and decisions, a dance that in most cases is not choreographed, but influenced, and tracking the dance if far more complex and complicated even just from the angle of data collection. So in short, what you have listed is ideology, not practical, the computing power that is needed to put the checks and balances is humongous. We need better alternatives if we really care about how decisions get made by machines.
VP Client Insights Analytics (Digital Data and Marketing) at Bank Of America, Data Driven Strategist, Innovation Advisory Council. Member at Vation Ventures. Opinions/Comments/Views stated in LinkedIn are solely mine.
2yThank you for the Share Aishwarya, very insightful and valuable.
I help insurers to build digital & data driven solutions | Analytics & Insights | ML & AI | HealthTech & InsureTech | Speaker & Author | Thought Leadership & Mentoring |
2yInsightful article Aishwarya Srinivasan 👏