How to Upgrade Data Security Before Adding AI: Five Steps

In a world where quick access to information is invaluable, Microsoft 365 Copilot allows corporate customers in the United States to rapidly achieve insights to a variety of critical questions personalized to their behavior.  However, while these new artificial intelligence tools are helpful for generating approaches and content based on natural language queries, they also pose potential risks of exposing sensitive data or violating compliance regulations. Across the organization, it is critical to establish at-scale foundations to provide a level of assurance that your data is secure and protected from unauthorized access, modification, or leakage.

These controls do not have to be complex. BDO Digital has outlined five steps below that can help decipher a clear strategy for more solid data security foundations.


Step #1: Refresh Your Inventory of Sensitive and Stale Information

One of the challenges of managing sensitive information is knowing where it is stored and how it is classified. To tackle this, it’s critical to find the right tool for the organization that will help locate and categorize this data. Microsoft Purview, as an example, helps discover, catalog, and classify data across on-premises, cloud, and hybrid environments. With Purview, data sources can be scanned and automatically labeled based on predefined or custom rules. Purview can also be used to create a business glossary and map data assets to business terms. This allows for a clear and consistent view of the data landscape and helps prepare it for AI implementation.

Before organizations use generative tools like Microsoft Copilot, they should ensure that their data is properly labeled and protected according to compliance requirements. Purview can help identify and classify sensitive data as well, such as personal information, financial data, or health records, and apply the appropriate policies to control access and encryption. Purview could also be used to monitor the data lineage and track the changes and usage of your data over time.


Step #2: Target the Data Life Cycle to the Portion of Value

Proactively reduce risk by identifying the realistic value-creating life cycle of data.  While it may be tempting to think that all data will eventually have some value, the reality is that while certain trend data has value over time, most created business information is not touched after one year, and some types of communications are not touched even after a week.  Beyond these windows, the vast majority of information becomes redundant, outdated, and trivial (ROT) data held within your tools, slowing access to the information that matters.

Microsoft Data Lifecycle Management (DLM) can help manage data efficiently and securely. DLM helps define and apply retention policies to data, based on factors such as age, type, or source. DLM can also be used to archive or dispose of data that is no longer needed or relevant, while preserving the data that is valuable or required for legal purposes. By reducing ROT, data quality and accuracy can be upgraded and therefore better prepared for AI applications.


Step #3: Tighten Policies To Control Sensitive Information

With the discovery and proactive data risks now identified in the organization, it’s time to start thinking about areas of more effective control.  For example, labels, encryption, watermarking, or digital rights management can be used to protect sensitive or confidential data from unauthorized access or disclosure. Microsoft Purview and custom integrations to AI in Microsoft Azure consume tenant data and create new information in the organization’s controlled cloud environment. Data classification can help ensure that only authorized and trusted users, applications, or devices can access and process company data, and that the data is aligned with business objectives and legal obligations.

A focused review that looks at how data flows into and out of these tools can help make policy adjustments and introduce new controls to reduce risk of data over-sharing.


Step #4: Identify Problematic Team(s) or Program(s) for Accidental Discovery

Not every part of an organization’s data should be easily discoverable with AI tooling or used to synthesize new responses. Teams like internal audit, those that handle sensitive programs, and corporate investigations should have private internal sites or entire portions of the organization excluded from discovery without decreasing the value for the larger departments they are part of.  Preparing these kinds of exclusions requires an intentional inventory of the potential teams or job roles that could hold this kind of extremely restricted information and proactively reduce or remove them from results through technical controls.  

As opposed to “data loss prevention” policies, which typically censor based on content, these controls require different safeguards that instruct AI tools to ignore specific repositories or identities completely as part of the “index” of information used to generate results.


Step #5: Build the Foundations

To protect organizational sensitive data from being exposed or misused by AI tools, organizations need to identify the areas of potential risk and take steps to reduce those risks. These foundations do not have to be complex and can be quickly planned and implemented with the help of a business and/or a consulting partner. This way, organizations can ensure that only authorized and relevant information is used to generate responses or insights for business needs. By implementing these best practices, organizations can build a strong foundation to begin their AI journey and leverage the benefits of natural language processing without compromising data security.

Contact BDO Digital today to help get started with your data security foundations.