Is the Office 365 Compliance Center Really Good Enough?

Is the Office 365 Compliance Center Really Good Enough?

 



Today, organizations work in a very agile environment. Users increasingly leverage software as a service (SaaS) productivity toolsets to complete daily workloads. Products like Office 365 and Google Suite offer users feature-rich environments with a wide array of utilities to make their daily lives easier. These tools provide email, social media, productivity applications, collaboration and convenient storage mechanisms. 

While these are great toolsets for users, they provide greater compliance and data privacy risk for companies. How do compliance, legal and security teams cope with these tools while regulations are getting more stringent? Both Google Suite and Microsoft Office 365 offer basic utilities to help organizations meet compliance and governance challenges, however these tools fall short of best practices and can be cumbersome to use.

What is eDiscovery and Supervision/Compliance? eDiscovery is the process of seeking and finding relevant information in electronic format. It is typically done in response to legal matters and investigations. Supervision/Compliance is a similar function, but it is typically reserved for financial institutions that are required to monitor communications between financial broker dealers and their customers.

What tools does an organization need in order to meet compliance and governance challenges? The most common response is compliance and eDiscovery applications that are specifically designed to search for relevant data needed for presentation of data for either legal case matters or data privacy compliance requests. Microsoft’s approach is to provide administrators with the Compliance Center. This toolset can help legal and compliance teams get a great deal of their work completed, however there are some cautions that organizations need to be aware of prior to deciding on a process for completing the required searches.

.

What if the indexing tool does not index the entire content of documents or does not index the type of document that needs to be searched? The unfortunate answer is if everything is not indexed, then a search cannot retrieve all the responsive data needed, which is obviously a problem. So, let’s look at the documents that are indexed by the email system in Office 365, please see the table below from the Microsoft website. 


No alt text provided for this image

 



(The link provided is for all current versions of Exchange and Exchange online) https://docs.microsoft.com/en-us/exchange/file-formats-indexed-by-exchange-search-exchange-2013-help, there is another link for SharePoint Online indexed documents; https://docs.microsoft.com/en-us/SharePoint/technical-reference/default-crawled-file-name-extensions-and-parsed-file-types?redirectedfrom=MSDN)

You will notice that the items indexed by Office 365 are primarily Microsoft Office files. This covers a good number of the file types that are created and transmitted daily. However, in the world of text-based file types, this represents a very small number of the actual text-based files that are currently in use. When looking at several products on the market that index well over 500 different file types, this is an obvious shortfall in Microsoft’s product and an opportunity for us to help make it better. 

You will also notice in the table above that PDF files are listed as indexable and we do not see any graphic images. Graphic based PDF files cannot be indexed by the Exchange Online engine, nor can any graphic images be processed by the index (OCR), which once again opens the door to miss important data. Many image-based PDFs that are “forms” can contain Personally Identifiable Information (PII) or Personal Credit Information (PCI), which are types that fall under almost every data privacy regulation on the planet. Some of the uses for these types of forms are credit applications, loan applications, passport requests, insurance forms and others. Organizations that cannot locate these documents could be at risk of being in violation of current data privacy rules.

Now that we understand the importance of indexing, what happens when items have not been completely indexed or are not an item that the index engine is capable of indexing? In the case of the Office 365 suite, these are appropriately referred to as “Partially Indexed Items”. A content search that you run from the Security & Compliance Center automatically includes partially indexed items in the estimated search results when you run a search. (This just put a flashlight on data that is partially indexed. 

Here are some of the reasons why items can't be indexed for search and are returned as partially indexed items when you run a Content Search:

·        Indexing Limits - Partially indexed items are Exchange mailbox items and documents on SharePoint and OneDrive for Business sites that for some reason weren't completely indexed for search. Most email messages and site documents are successfully indexed because they fall within the Indexing limits for email messages. However, some items may exceed these indexing limits, and will be partially indexed. 

·        Email messages have an attached file of a file type that can't be indexed; in most cases, the file type is unrecognized or unsupported for indexing

·        Image files that require OCR capabilities.

·        Email messages that have an attached file without a valid handler, such as image files; this is the most common cause of partially indexed email items

·        Too many files attached to an email message

·        A file attached to an email message is too large

·        The file type is supported for indexing, but an indexing error occurred for a specific file


 

Although it varies, according to Microsoft’s documentation, most Office customers have less than 1% of content by volume (total number of files) and less than 12% of content by size (size of the files) that is partially indexed. The size of files is most important as larger files have a higher probability of containing content that can't be completely indexed. Therefore, if your organization has 1 Petabyte of data to search through, the content by size data partially indexed could be as high as 120 Terabytes (or 10 Terabytes of content by volume partially indexed). This seems like a somewhat large amount of missed data that could cause risk.

Now that we understand some of the challenges that are presented by leveraging Office 365 compliance and eDiscovery tools, let us take a closer look at the Google Suite tools used for the same purpose. 

Google Suite provides a feature rich set of tools for its users with collaboration, messaging and ease of use at its core. However, Google Suite provides a limited toolset for compliance and legal teams to respond to today’s demands. When it comes to capturing email, journaling is the gold standard that messaging and legal teams have relied upon for receiving immutable versions of the email stream that contain data that has not been altered or reviewed by the end user. Journaling is a copy of all correspondence that flows through the message transport.  Google Suite only provides the Journaling feature in its “Enterprise License”, which is a very important concern and is covered in one of my other white papers titled “Why do Organizations Need a Third-Party Archive?”. Other features that are missing or are not very scalable are true financial compliance and eDiscovery tools that meet the needs for medium to large organizations. So, let’s discuss what organizations really need to meet the heavy demands of the compliance and eDiscovery workloads.

With so many compliance and eDiscovery software options available to organizations today, it can be challenging to find the solution that best suits the needs of any company.  So, let’s discuss why there are so many challenges associated with compliance and eDiscovery. As we looked at the current offerings from Microsoft and Google, we saw a disparity of features. One tool is feature rich but is lacking in a few areas; the other needs some serious help in order to provide organizations with the tools they need to survive in today’s reality. So, where do we go from here?

Even though compliance and eDiscovery review tools come in two varieties (desktop and cloud-based), all review applications share a common set of core features, with some variations.  We have seen this in the Microsoft Compliance Center, where there is a set of tools that allow you to get the job done, to a certain point. But now, let us discuss some details. First, there is “the big three”; recursive parsing, search & index and tagging & organizing documents.  All enterprise tools have these core features but let’s drive into some extra components that can be extremely valuable. The more features a tool has, the easier the search, review and presentation of the data will be.

1. Document Viewing Options

When you open a document within the tool, some may display it as a chunk of plain text or it may be displayed in a similar appearance to what the document would look like if opened in its original application.  When reviewing documents, being able to view a styled version without opening the original document is very useful.  Remember, opening the original document is rarely a good idea, as you could inadvertently change the file/metadata and/or expose you to the risk of infecting your computer with a virus or malware.

2. Redaction

Redaction is the process of protection used to describe removal of some document content by replacing it typically with black rectangles, which indicate the removal. For example, originally classified documents released under the freedom of information legislation may have sensitive information redacted in this way. This practice is discussed later in this article by an alternative name –. eDiscovery and compliance tools that offer this feature can also include several options, such as find and redact, which helps operators to not miss information that is required to be redacted.

3. Additional Content Sources

The content referenced in both Google Suite and Microsoft Office 365 are the typical targets for compliance and eDiscovery. However, there are many data types in the wild that are not included in these productivity suites. Other content sources like Social Media (Twitter, FaceBook, SnapChat, LinkedIn and others) are also typical targets for litigation and compliance, along with phone text messaging as another important source. Some tools can ingest, index and search over 80 different content sources including resources like WebEx, Skype, Box and others. eDiscovery Platform can index hundreds of document types which is far more than any other tool available in the market

4. Audio Indexing

Electronic document productions often contain a variety of media files, but they can only be searched for matching text.  Since audio files are not able to have text extracted, they are invisible to the search engine. Fortunately, some compliance and eDiscovery applications, including eDiscovery Platform now offer the ability to index audio files, permitting those files to be searched with text-based queries. If audio indexing is not available, some review tools will flag them and other files that lack text content so that they can be reviewed manually.

5. Optical Character Recognition (OCR)

OCR is to image files, what text indexing is to audio – they both locate text in non-textual media.  Sometimes document sets include images taken from a smartphone or camera and those images could be pictures of documents or even screenshots. OCR is necessary for the information in these images to be visible to the search index.

6. Email Threading

Emails are some of the most common sources of electronic evidence and they can also be the most revealing. Having adequate email review functionality is critical for any document review project. Each email in the review tool should be grouped with its attachments, along with other emails in the “chain” to which the email belongs. For example, if you are viewing an email that is a reply to an original message and that was subsequently replied to, there should be links to the original message and the later replies.

7. Machine Learning

Machine learning (aka, “Predictive Coding” or “Technology Assisted Review”) is an exciting technology that offers the promise of locating relevant documents accurately and automatically by relying on patterns and inference to make decisions about what is relevant. Some compliance and eDiscovery experts believe that machine learning can locate relevant documents faster and more accurately than keyword searching. When working on a case with a large data set (e.g., multiple terabytes of electronically stored information), I would recommend choosing a platform that has machine-learning capabilities. Better yet, a system that provides intelligent review, which provides automated decision-making based upon previous outcomes performed by a reviewer.

8. Data Visualization

Data visualization runs the gamut from simple pie graphs showing the relative frequency of various file types in the source data to complex diagrams showing the volume of email traffic between different persons of interest in the case over time. While a well-designed data visualization can help litigators see the forest and not just the individual trees, visualizations are often gimmicky and not very helpful. It’s important to have an idea of what you’d like to see depicted in visualizations first, and then assess whether a specific tool supplies that need.

When looking for a partner to fully comply with legal, compliance and data privacy needs, there are obviously many features, functions and benefits to consider. The best way to obtain a good solution for your organization is to be as educated as possible about the technology and compare the options. Fortunately, a good tool will pay for itself (usually in the first incident) by making document review more efficient and helping you to locate “the smoking gun” evidence in more of your cases.

The Bottom Line

At Veritas, we take our customers’ eDiscovery and compliance needs seriously and provide a single sourced solution for these needs. However, no discussion of eDiscovery is ever complete without an understanding of the process of eDiscovery. The Electronic Reference Discovery Model (EDRM) is the foundational workflow that encompasses all the necessary steps for successful search and response procedures (please see the diagram below). The Veritas eDiscovery Platform is an end to end solution that encompasses the eDiscovery process as well as provides tools that meet all your compliance, Freedom of Information Act (FOIA) and California Consumer Data Privacy (CCPA) needs in a single, easy to use web-based application.


Veritas has worked closely with our customers and their investigators / litigators to understand the challenges they face, regarding compliance, investigation and litigation issues.   Since many of our customers have faced the same issues, we have incorporated the more common ones into our eDiscovery solution. 

Discovery tools need to collect from various content sources and these diverse sources can often be a challenge for some tools.  These challenges have been constantly increasing as the number of requests for information increase, typically due to regulatory requirements such as data privacy rules and government information requests. This ever-increasing number of compliance requests is making data collection a constant issue. Therefore, making the processing of data and making it available for review, a burdensome task.

Whether you decide to do in-house review or external review, the costs are continually increasing, since external review contractors charge by the page. In a large investigation or litigation, these costs tend to snowball out of control because of the time it takes to manually review large amounts of data.  When this process is performed for compliance or regulatory requirements, time is something you do not have as in many cases, this data needs to be identified, collected, reviewed and produced in a very short amount time.

The solution to all these challenges above is the eDiscovery Platform collections module, which offers customers the ability to collect from various content sources through a single user interface.  Veritas’ customers are presented with a choice of either having the IT department perform the collections or allowing the legal compliance or investigation team to do their own collections totally bypassing IT.  This removes the burden of all these collections from the IT department, leaving them to continue their normal jobs and leave the collecting to the investigation team. This process gives the investigation team full control over where and from whom to collect data, where to send the collected data and how the process is managed. The collections module identifies custodians (the subject of the search), has a desktop/laptop search and a collection tool, that builds an interactive map of custodians and their data sources, collects to a preservation store (legal hold) and filters data by keyword and metadata. This module provides a single interface to collect from various sources and reports on what was collected and when, thereby allowing the organization to collect data in a defensible manner.

 

The next phase is the processing of the data.  During this step, the processing and analysis module enables rapid and accurate filtering, processing (indexing), searching, and data analysis in multiple formats and languages.  Let us think about indexing. Earlier we discussed the number of items indexed in Exchange Online and Google Suite, which as we reported, is only a handful of document types. The eDiscovery Platform can index hundreds of document types AND performs OCR on images, processes audio and video files for search and categorizes data through multiple policies, making for an easier search. Using the Veritas eDiscovery Platform Processing and Analysis module, corporations, government agencies, and law firms can perform early case assessments and are able to rapidly cull down data. This culling process reduces the overall electronic discovery costs.  As an integrated part of the eDiscovery Platform, the Processing and Analysis Module also supports the iterative workflows required during real-world electronic discovery. This solution delivers deep insight into case facts and enables a new level of transparency and defensibility throughout the electronic discovery process.

The eDiscovery Platform’s Review and Production module accelerates the review process.  It provides unprecedented scalability and introduces the flexibility to deploy case dependent linear review and predictive coding and intelligent review, expediting document review. This solution also eliminates the need to create load files and enables an iterative electronic discovery workflow. Other features included in this module are concept-based search (find the secret), audio and video keyword search, document redaction (bulk redact, find & redact and other important redaction features), conversation threading (learn who sent what to whom), item tagging, and machine learning for technology assisted review. The full featured toolset allows an organization to produce data in multiple formats for court presentation, handoff to opposing counsel or turnover for data privacy and FOIA requests. As one of four modules in the Veritas eDiscovery Platform, users benefit from one seamless application that supports the entire electronic discovery lifecycle.

Another function of the eDiscovery platform is the Legal Hold module which streamlines and automates legal hold management. This module enables legal teams to satisfy the duty to preserve from anticipation to completion of litigation by providing a repeatable workflow. As a component of the eDiscovery Platform, users can have one seamless application to manage hold notices and rapidly identify and collect critical data on demand. Some of the features of the legal hold management module are; hold notices, reminders, escalations, custodian survey creation, a custodian portal (to answer requests) and a complete reporting structure. The legal hold module minimizes the risk of sanctions while providing the highest level of defensibility across the entire electronic discovery lifecycle.

Closing

Unlike the O365 Compliance Center and the Google Suite tools, the Veritas eDiscovery Platform is a standout in the industry. eDP provides a simple user interface with remarkable tools that make the process of eDiscovery feel like a trip to Amazon. Whether you are looking for the smoking gun to prove your case or fulfilling a compliance request, the eDiscovery Platform is the tool of choice for many organization, legal teams and compliance officers.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics