Label Google Drive files automatically using AI classification

Supported add-ons for this feature: Gemini Enterprise, Gemini Education Premium, and AI Security. Compare add-ons

The AI classification feature uses artificial intelligence (AI) to automatically label your organization’s sensitive content. After an initial training period, during which the AI model learns your organization's criteria for sensitive content, AI classification can automatically apply labels to both new and existing files in Google Drive.

Here's how to get started using AI classification:

1) Set up training: To get started, you create the classification label which the AI model will automatically apply to files once training is done. You also create the training label—a label that's nearly identical to the classification label.

2) Train the model: During the training period, typically about a week, your designated labelers—users at your organization who can evaluate sensitive files—begin classifying Drive files with the training label. From their examples, your model begins learning how to similarly classify sensitive files.

3) Turn on automatic classification: Once the model is trained (after about a week), you're prompted to turn on automatic classification. You can monitor how many files are classified, and how accurately, on an ongoing basis.

For exact details on each phase, go to the linked sections below.

Before you begin

If you’re not familiar with Drive labels, go to Get started as a Drive labels admin for details on how they work and how to create them.
For best results, create a configuration group for your designated labelers that's separate from the rest of your organization. For instructions, go to Customize service settings with configuration groups.

Set up training

Create the classification label

The classification label is the label the AI model will automatically apply to your sensitive Drive files after the model is trained. The model will be trained on and use only one field per label. The AI-set field must be either a badged or option list field type. For more information about labels, go to Get started as a Drive labels admin.

When used as a classification label, an option list or badged field must meet these requirements:

Have at least 2 and no more than 7 options
Must be published

If you have an existing label that meets these requirements, you can use it as a classification label. Otherwise, create a label.

Create the training label

We recommend that you create the training label during label selection (next step), when you can create it automatically. This guarantees the training label will match the classification label in all the required ways.

If you choose to create the training label before label selection:

Make sure the label meets the required label criteria.
Identify the training label with the word "training" to make it easier for your trusted labelers to recognize the label and apply it during the training period.
Add a description field to the training label to further help trusted labelers understand its purpose.

Select labels and enable training

Sign in to your Google Admin console.
Sign in using your administrator account (does not end in @gmail.com).
In the Admin console, go to Menu SecurityAccess and data controlLabel manager.
In AI classification for Google Drive, click Set up training.
For Select classification label, click Select Label.
Select the label you want AI classification to use and the field it will set.
For Select training label, click Create training label.
This automatically creates a training label with the same attributes as your Classification label.
To make sure the new label is available to your designated labelers, click Update label permissions. This opens the label in Edit mode in label manager in a separate tab.
Note: You can also set label permissions later. But it’s important that only your labelers have access to the training label.
Click PermissionsEdit, then grant the Can apply labels and set values permission to the configuration group that contains your labelers.
Click Save and close the label manager tab.
After selecting both the classification label and training label, the Enable training button is enabled.
Click Enable training.
Important: If you get an error message when you try to enable training, it means your classification label and training label don’t match. Review the label requirements below and make sure your labels meet all requirements, then enable training.

After you enable training, the Data classification page shows your selected Training label and Classification label.

The Classification label shows Not ready. After training is done, the label status changes to Ready.
Auto apply status shows Off for everyone. Once the Classification label status is Ready, you can then change the Auto apply status to On.

Next, your designated labelers need to start applying the Training label to your sensitive files.

Train the model

To successfully train the AI model, your designated labelers should label at least 100 files per option. For example, if your label has 3 options, it should be applied to at least 300 files in total. The AI model checks training every 1–2 weeks and shows Ready once it has 100 or more examples for each label option. Learn more about high-quality examples.

During the training period, you can check progress for how many files have been labeled and how the accuracy of the model is improving.

Note: Training files have a 1 million total limit.

To check progress during the training period:

In your Admin console, go to SecurityData classification.
Click View model details.
- For Training label, Training files shows the number of files that have been labeled for each option.
- Each label option has a Score that shows the percentage of training examples the model classified correctly after testing itself.
  - Low— Below 50%. The model needs better data and isn’t ready yet.
  - Medium—50-80%. The model may be ready on a limited basis.
  - High—Above 80%. The model is ready to classify files for your organization.

Turn on the auto-apply of labels

After the AI model is trained to achieve a high level of accuracy, you’re ready to choose label options and turn on the auto-applying of labels. Follow these steps:

In your Admin console, go to SecurityData classification.
In AI Classification, verify that the Classification label shows a status of Ready.
Click View model details.
For Classification label, check the boxes for the label options you want to allow the AI model to auto-apply.
Click Turn on auto-apply.
Search for and select the organizational unit or group to include those user members to automatically apply labels for. For example, if you select the group "Finance", you can then select the labels to be configured for Finance.
Click On - Label is auto-applied.
Options for how the label is applied are listed under the On option.
Click Save.
On the Data classification main page, the Auto-apply status for the rule changes to On.

When does AI Classification scan files?

AI Classification scans files at rest at least once for users and shared drives that have auto-apply enabled. This process can take 1-2 weeks after auto-apply has first been enabled.

AI Classification also scans files when they are uploaded or modified. The applied label may change based on content changes to the file.

Monitor AI classification label events in the Drive log

You can get specific details on how AI classification is labeling files by looking at events recorded in the Drive log.

Go to SecurityData classification.
In AI classification for Google Drive, click View model details.
Click View logs.
The Security Investigation Tool opens in a new tab, showing search results for the Drive log for two AI Classification-related events: Label applied and Label field value changed.
Click on the event Description to get additional details, such as:
- Name and type of the document that was labeled
- Label field value assigned to the document (for example, Confidential or Restricted, if those are your label options).

Turn off the auto-apply of labels

You can turn off the auto-apply of all labels, or turn off specific options.

Go to SecurityData classification.
In AI classification for Google Drive, click View model details.
- For Classification label, uncheck Allow in the Auto-apply column to pause auto-apply for that option.
- To completely pause auto-apply, uncheck all options.

Turn off auto-apply completely for specific organizational units or groups

Use this option if you want to turn auto-apply completely off for content owned by users in specific organizational units or groups.

Go to SecurityData classification.
In AI classification for Google Drive, click View model details.
Click Manage auto-apply.
Click an organizational unit or group at left to select it.
In Manage AI auto-apply, click OFF.

Reset the model

At some point, you may need to reset the model (for example, to start another test, or because model accuracy is not improving). If you need to reset the model,note the following:

If you reset the model, wait for your model to train before AI classification can turn on the new classification label and apply it to the files.
Previously applied training labels remain on the files. After resetting the model, you can choose to configure a new model to use the same training label (or a different one).
Automatically applied labels remain on the files after you reset the model.
If you choose the same classification label for the new model, the AI classification feature ignores and overwrites the predictions of previous models. In this way, you can use the model reset to "reprocess" your organization's Drive files. This can be useful if you made significant improvements to model quality since your initial deployment.

Go to SecurityData classification.
In AI classification for Google Drive, click View model details.
On the AI model details page, for Actions at right, click Reset model.
The Reset model dialog lists the effects of resetting the model.
To continue, click Reset model.
AI classification is reset to its initial state. To restart, click Set up training and pick new classification and training labels.

FAQ

Expand all | Collapse all

What are the requirements for the training and classification labels?

Both the classification label and the training label must meet the following criteria:

Contain a minimum of 2, and a maximum of 7 options.
Have their options in the same order in each label. For example, if the classification label has options in this order;
The training label options can’t be ordered as follows:
- 1. Option 1
- 2. Option 2
- 3. Option 3
- 1. Option 2
- 2. Option 1
- 3. Option 3
Have labels that are published.
Have labels with different access permissions. The training label should be available only to designated labelers who can be trusted to train the model. The classification label can have broader access.

Can I use the classification label as the training label?

No. The classification label and the training label must be different. The label you choose as your classification label is not displayed as a selectable choice for the training label.

What are good files for the model to train on?

For best results in training the model, have your trusted labelers should follow these guidelines when choosing training files:

Ensure that each file has a minimum of approximately 500 text characters.
Select files that best represent actual content that your users create, share, and use in your organization
Select roughly the same number of files per label option, with a minimum of 100 files for each option. This helps the model to gain a comprehensive understanding of your data and improve scores.
Include a representative variety of files for each option type. For example, don't label 100 resumes as your total set of example files for Top Secret if contracts are also a common Top Secret file type in your organization.

Does AI classification work for labeling only sensitive content?

Sensitive content is the primary focus for AI classification, but any label with up to 4 options can be trained for automatic labeling.

Can the model train on multiple languages?

The model does support multiple languages; however, a representative variety of files for each option type and language should be included in the training data. This increases the number of files required to successfully train the model. Writing systems without word boundaries, for example, Japanese and Chinese languages, are not yet supported.

How are scores calculated?

During training, the AI model uses 75% of the input data to train itself on how to label files and reserves 25% to periodically test its own performance. In other words, for 25% of the labeled files, the model analyzes those files as if it didn’t know what label has been applied. The AI model then makes its own label choice and compares that choice with the actual label applied by the designated labeler. The scores show what proportion of the reserved files it correctly assigned the right label to.

What happens when an option is disabled for auto-apply?

During scanning, if a file is predicted to have an option for which auto-apply is disabled, AI Classification applies no label or field value to the file.

Files that AI classification has previously labeled retain the applied label and option values even after the option is disabled.

How and when does AI Classification revise the auto-applied labels?

Following the creation of the model and the enablement of auto-apply, AI Classification scans and classifies all files at rest for which sufficient text can be extracted. These files are scanned at least once.

AI Classification reprocesses files periodically as content is modified. Content changes may result in a different prediction for a file. When AI Classification has both an old and a new predicted option for a file, it will prefer the option that is higher in the option list. For example, if a field has three options listed in the label manager:

Confidential
Internal
Public

Suppose AI Classification classifies a file as Internal, and the content changes so that the AI Classification model predicts Confidential. In this case, the classification on the file is changed to Confidential. However, if the AI Classification model predicts Public, the classification on the file remains as Internal.

AI Classification does not revise auto-applied labels and field values that have been reviewed or modified by users.

For multiple classification mechanisms, such as AI, DLP rules, and Default Classification, what is the mechanism that takes priority when classifying files?

Data classification is done in the following order:

DLP rule without user overwrite
Manual classification
DLP rule with user overwrite
AI Classification
Default classification

Removing a label or field allows a lower-tier classification mechanism to take effect. For example, a file with a label removed by a user can later have the same label auto-applied by AI Classification.

Are there any restrictions to the types of files to which labels can be applied?

Any Drive item can be labeled from Drive. The editor has native labeling UX as well.
AI Classification leverages the same indexable text processing as Drive DLP. Any file that Drive can extract indexable text from can be evaluated for AI Classification applied labels. It’s not possible to extract indexable text from every file, so it’s not guaranteed that AI Classification can process every file.
AI Classification requires that a file meets a minimum text threshold before it makes a classification decision. As a result, some files such as very short documents and images with small amounts of text may not get classified.

How does the feature work for users without an eligible license?

As long as some users in the domain have an eligible license that supports AI Classification, your admin can train the model. Files marked with the training label can be owned by any users with a Drive Labels-supported license. The auto-apply functionality only applies labels to files owned by users on an AI Classification-supported license; files owned by users without a supported license are not processed by the AI.

Was this helpful?

How can we improve it?