Navigating the Maze: Vetting AI Vendors for Privacy and Data Protection

teacher and students

by Alan Koenig and John Lee

We have all seen in the last few years that artificial intelligence (AI) is revolutionizing industries from finance to education. The importance of selecting trustworthy AI vendors cannot be overstated. Beyond the capabilities of the AI itself, considerations about privacy and data protection are paramount. Whether you’re integrating a language model like OpenAI’s GPT-4o or Claude’s Sonnet 3.5 into your systems or exploring other AI solutions, understanding how your data and interactions are handled is crucial.

Understanding AI Infrastructure

When vetting AI vendors, one of the first questions to ask is where and how the AI model is housed. In the case of ChatGPT and similar publicly available services, these are typically hosted on cloud platforms managed by the vendor (such as OpenAI in the case of ChatGPT or Microsoft in the case of Copilot). Many data sets and AI large language models (LLMs) are open-source and can be found on hubs like Hugging Face – where the machine learning community comes together to collaborate on learning and building tools. At the very least, it’s essential to clarify who has access to the data powering these models as well as a clear understanding of who controls the visibility into the interactions users have with the AI.

Data Handling and Privacy Concerns

For organizations uploading documents or sensitive data for the AI to process, it’s crucial to know where these documents will reside. Are they stored securely? More importantly, does the vendor segregate this data from other clients’ information, and does it continue to train the AI model? Ideally, data uploaded for specific tasks should not be used to enhance the general AI model unless explicitly agreed upon. Be sure to read the terms and conditions related to data management and protections provided by your vendor.

Protecting Personally Identifiable Information (PII)

The safeguarding of PII is non-negotiable. Especially when dealing with AI interactions involving individuals, such as customer service or educational applications, vendors must ensure strict protocols are in place to prevent the inadvertent exposure of sensitive information. This is especially critical when considering applications used by children under 18, where additional regulations like COPPA (Children’s Online Privacy Protection Act) in the United States come into play. Contracts should cover these privacy concerns, and users (and their parents or guardians) must be made aware of what constitutes PII, along with best practices for interacting with the AI model. De-identification of data is a potential preventative step that can be taken prior to uploading to AI sites, especially if it is not required to get the desired results you expect from the AI models.  Sometimes even de-identification is not enough; having enough information on multiple background variables could be used to identify individuals even if their names are removed.

Transparency and Access Control

The most crucial aspect of vetting AI vendors revolves around visibility and control. Who can access the data sources and traffic generated by interactions with the AI? Organizations must have clarity on how their data is managed, who within the vendor organization has visibility into it, and what measures are in place to prevent unauthorized access or breaches. Third-party vendors should be given additional financial scrutiny, particularly the field is in constant flux.

Compliance and Certification

A responsible AI vendor should be able to demonstrate compliance with relevant data protection regulations (such as CCPA in California) and industry standards. Look for certifications or audits that attest to their commitment to privacy and security practices.

Contingency Plans if AI Vendors Get Hacked

Be aware of reporting requirements that AI vendors need to be held to as well as reporting to your stakeholders. If the vendor is hacked, you should plan ahead and set the policies and notifications that need to be executed and within what time frame.

Conclusion

While integrating AI into your organization can offer transformative benefits, the journey begins with rigorous vendor vetting. Paying close attention to how AI models are housed, who controls data visibility and access, and how privacy concerns are addressed can mitigate risks and ensure compliance with data protection regulations. By prioritizing these considerations, organizations can harness AI’s potential while safeguarding sensitive information and maintaining trust with stakeholders. Protecting privacy isn’t just a legal requirement—it’s a fundamental aspect of ethical and responsible deployment.


This is a staging environment