Building Effective Information Governance with Data ... - Netwrix

22
Building Effective Information Governance with Data Discovery and Classification

Transcript of Building Effective Information Governance with Data ... - Netwrix

Building Effective Information Governance with Data Discovery and Classification

Table of Contents

1

1.1

2.1

2.2

2.3

Why automation is critical

4

5

5

7

7

7

8

9

Introduction

2

Making more accurate risk assessments

Reducing risk by minimizing access permissions

Mitigating risk by automatically quarantining or redacting sensitive data

9

10

10

11

11

11

Data Classification and Risk Management

Executive Summary

3

Enhancing other data security products (DLP, IRM, etc.)

Using Data Classification to Protect Data

3.1

4

Monitoring and reporting on user permissions and activity involving sensitive data

How Data Classification Aids Compliance Activities

4.1

5

E-discovery and litigation support

Eliminating duplicate, unnecessary and obsolete data

Using Data Classification to Support Other Functions

5.1

5.2

15

6

7

Advanced data discovery techniques

Agent-based vs. agentless architecture

Supported platforms and data types

Discover relevant data across multiple repositories faster

Prioritize your data security efforts

Enforce information governance policies

Revoke excessive permissions

Stay compliant with data retention mandates

Reduce the total cost of storage by cleaning up low-value information

Improve the efficiency of data management technologies

Choosing a Data Classification Solution

Building Effective Information Governance with Netwrix Data Classification

12

12

13

14

15

16

17

18

19

20

21

22

22About Netwrix

About the Author

6.1

6.2

6.3

7.1

7.2

7.3

7.4

7.5

7.6

7.7

4

Executive Summary

Information governance is essential for every organization today. Whether you are concerned about the security of your critical business data or need to comply with increasingly complex regu-lations, a solid data discovery and classification (DDC) solution is a wise investment. In fact, as the volume and variety of data has exploded, DDC has become a business necessity.

DDC solutions scan data across your IT infrastructure and tag it according to its value and sensi-tivity. That way, IT teams and data managers can ensure it is handled and protected appropriately — for example, by putting proper data access controls in place, keeping permissions up to date, and implementing the right backup and restore capabilities.

This eBook explores the capabilities and benefits of DDC solutions, as well as what to look for when selecting a DDC solution. You’ll discover how a DDC solution can help you save time and money while minimizing the risk of security incidents and compliance failures.

Executive Summary

5

1.

1.1

1| Introduction

Why automation is critical

Data discovery and classification overcomes all of these challenges. But today, both the dis-covery and classification processes simply must be automated.

First, the discovery process has to be able to find data no matter where it’s stored: in data-bases, on SANs or NAS in company data centers and wiring closets, in the cloud, on server-at-tached storage, on standalone devices, and more. Note that we are talking about both struc-tured data and unstructured information like documents, PowerPoint presentations, pictures and emails, and about both physical and virtualized platforms. Manual processes are simply not a scalable approach to data discovery in today’s complex IT ecosystems.

Similarly, classifying data manually is simply too labor-intensive, time-consuming and er-ror-prone to be a practical approach for all but the smallest of companies. In particular, man-ual data classification suffers from the following issues:

Introduction

With the amount and variety of data exploding around the world, organizations are facing unprecedented data sprawl — data is stored in every nook and cranny of the enterprise IT infrastructure, and no one knows which files contain personally-identifiable information (PII), information about particular projects, intellectual property (IP) or other valuable or regulated content. As a result, they struggle to protect information adequately, comply with legal mandates, weed out duplicate and redundant data, and empower employees to find the content they need to do their jobs.

Inconsistency. Different people will classify similar documents in different ways.

Inaccuracy.

Employees with their own jobs to attend to often fail to classify data at all or simply pick the first tag in the list to expedite the process.

61| Introduction

The Value of data classification.

� Protect sensitive against misuse or loss as required by compliance regulations

� Guard proprietary data to preserve its business value

� Save time and money by tailoring data protection methods to each type of data

� Retain data as long as required by compliance or business requirements and dispose of it responsibly

� Reduce the risk of fines and other expenses associated with security incidents and compliance failures

� Improve the efficacy of other data management tools, which can use the tags that the DDC solution embeds in content

Inflexibility.

As company requirements and regulations change, no one has the time or inclination to update the tags on gigabytes or terabytes of existing data.

Failure.

As users realize that data is not classified correctly, they will quit trusting the process and the whole project will fail.

Automating data discovery and classification overcomes these limitations by making the process reliable, accurate and continuous. The discovery process finds data across the IT environment, and the classification process checks them for the various types of data it’s set to classify. For example, it can spot PII by looking for data patterns that indicate names, dates of birth, addresses, phone numbers, financial information, health information, and Social Security numbers. And it can re-classify the data later as needed due to organizational or regulatory changes.

72| Data Classification and Risk Management

Data Classification and Risk Management

2.

2.1 Making more accurate risk assessments

Accurate risk management is critical for organizations today. Overestimate your risk and you might fail to take advantage of opportunities that could increase your market share or profit-ability. Underestimate your risk and you can face schedule or budget overruns, legal compli-cations, and compliance violations.

DDC solutions can help you make more accurate risk analyses, and therefore lead to more effective risk mitigation and contingency planning. However, two capabilities are crucial: flexibility and accuracy.

Most DDC solutions come with a number of predefined criteria for discovering and classi-fying data. But they also need to empower users to easily customize the rules and create new categories, so all their data will be properly classified according to their unique require-ments.

It’s difficult to properly assess and mitigate risks when data is misclassified. The data discov-ery process needs to be thorough and the classification process needs to be highly accurate.

Flexibility.

Accuracy.

82| Data Classification and Risk Management

2.3 Mitigating risk by automatically quarantining or redacting sensitive data

Some data classification solutions offer additional risk mitigation features. Some can au-tomatically move sensitive data that is discovered in improper locations to a secure quar-antine area where it is protected while a data manager determines where it should reside. Some tools can also automatically redact sensitive information from documents that need to be disseminated for compliance or business reasons.

DDC solutions help you know which data merits strong restrictions and oversight, and which data can be made freely available to employees, customers, partners and other groups. This information is critical for assigning privileges properly and conducting effective privi-lege attestations.

2.2 Reducing risk by minimizing access permissions

The single most important best practice for reducing risk is to implement the principle of least privilege. By granting each user and system account the minimum permissions necessary for it to operate successfully, you reduce the ability of each account to cause damage, whether it is being used by the legitimate owner or has been taken over by an attacker or malware.

9

Using Data Classification to Protect Data

3.

Data classification can also help organizations ensure the availability, integrity and confi-dentiality of their data. A DDC solution can help you prevent accidental and deliberate data modification or loss by enabling you to evaluate your data management strategies and strengthen your security posture. For example, you can protect sensitive data from being sent outside the organization or copied to a thumb drive, and ensure you have appropriate backup and recovery processes in place based on the value and sensitivity of the data.

The best DDC solutions also offer integration with monitoring and alerting tools that can notify you about suspicious activity that could put sensitive data at risk, as well as streamline regular review of user permissions and attempts to access certain types of data.

3 | Using Data Classification to Protect Data

3.1 Enhancing other data security products (DLP, IRM, etc.)

The tags that DDC solutions assign to data can improve the effectiveness of other data management tools, such as data-loss prevention (DLP), information rights management (IRM), systems management, and backup and recovery solutions. For example, without da-ta classification, companies must back up everything to make sure they don’t miss anything of value. But data classification empowers them to tune their backup criteria so that mis-sion-critical data gets the highest level of redundancy and lower priority data gets a lower tier of protection.

Indeed, any software tool that needs to make decisions about how to treat files based on their content can benefit from the automatic tagging of the data classification process.

104 | How Data Classification Aids Compliance Activities

How Data Classification Aids Compliance Activities

4.

Compliance regulations are increasing in number and scope, and packing greater penalties for failures. Some, like Sarbanes-Oxley (SOX), even include civil and criminal penalties for corporate leaders who fail to ensure the accuracy of certain activities and filings.

DDC solutions help you ensure and prove compliance by automatically identifying data that is subject to specific compliance regulations so you can put proper controls in place. You can also make more accurate decisions about data retention because you know which regulations and policies apply to which data. For instance, the U.S. Internal Revenue Service has very specific guidelines on which types of corporate data must be retained and for what length of time.

As noted earlier, some solutions can also quarantine, redact and move data. These features facilitate response to requests from data subjects under GDPR and other data privacy man-dates. For example, instead of facing the nightmare of having to manually discover and delete all sensitive data related to a particular individual, with DDC, you can automate this process to ensure that your company is always in compliance. Similarly, some solutions can automate the redaction of sensitive data in documents that must be exposed to users who do not have the rights to view it. Some DDC solutions even offer workflows that can automatically revoke access rights for users who should not be able to access sensitive data.

4.1 Monitoring and reporting on user permissions and activity involving sensitive data

Discovering and classifying your data is a critical first step, but many compliance standards al-so require you to maintain tight control over activity around sensitive data. Therefore, it’s es-sential to choose a DDC solution that integrated easily with a monitoring and alerting tool.

115 | Using Data Classification to Support Other Functions

Using Data Classification to Support Other Functions

5.

5.1 E-discovery and litigation support

Lawsuits and other legal actions are very common for companies of all sizes these days. They often come with inflexible deadlines and long lists of criteria for content that could be evidence and therefore must quickly be retrieved or put on hold. With data classifications, metadata tags, time/date stamps and other pertinent information from your DDC solution, performing e-discovery as part of legal requests is much more manageable.

In fact, considering the high stakes of modern legal warfare, data classification solutions of-fer a significant competitive advantage to corporations with a high volume of legal inquir-ies, e-discovery and litigation support. Indeed, these use cases alone can easily justify the expenditure for a DDC solution.

Finally, data discovery and classification can also help you automate e-discovery and storage optimization.

5.2 Eliminating duplicate, unnecessary and obsolete data

Data classification can also help companies find and eliminate data that doesn’t need to be kept, such as:

Eliminating this data reduces risk (since it cannot be lost or stolen), reduces data manage-ment and storage costs, and improves user productivity by reducing clutter.

� Duplicate documents � Files that serve no business purpose, such as routine corporate correspondence for closed

and abandoned projects and other outdated files � Data that has no business, legal or regulatory value for the company

126 | Choosing a Data Classification Solution

Choosing a Data Classification Solution

6.

Because data discovery and classification has such profound effects on an organization’s security, compliance and operations, it’s essential to evaluate candidate solutions carefully. Here are some specific features to look for.

6.1 Advanced data discovery techniques

Solutions that can use compound terms to search for and classify sensitive data deliver more accurate results. Look for a tool that can handle all three variations:

Keyword stemming looks at the root of a word and automatically includes all its variations in the search process. For instance, the word discovery yields the step discover, and the solution will also find instances of discovering, discovered, etc.

Though online search engines commonly use this technique to find the greatest number of matching documents and sites, do not take it for granted that every data classification solution offers keyword stemming out of the box. Some tools require data managers to “teach” the search engine about the variations of a word or even require them to manually define all possible variants.

Compound terms

Keyword stemming

� Hyphenated compound words: Two or more words separated hyphens, such as father-in-law or agent-based

� Open compound words: Words that are commonly written together but are separated by a space instead of a hyphen, like truck stop and luxury hotel

� Closed compound words: Words that are typically written by combing two words, such as rooftop and beachfront

136 | Choosing a Data Classification Solution

6.2 Agent-based vs. agentless architecture

Like many types of management software, DDC solutions come in both agent-based and agent-less forms, as illustrated in Figure 2. Which is better depends on your needs and priorities.

FIGURE 2: Agent-based vs. agentless architecture

Advantages of an agentless architecture include:

Agentless architecture

DDC server

Servers

Storage

Servers with local agent

Network devices

Network devices without local agent

Cloud resourcesCloud resourceswith local agent

Storage with local agent

Data collector/ agent proxy

DDC server

Agent-based architecture

However, the pro-agent crowd argues that agentless solutions fail to deliver the features, scalability and robustness that agents make possible. An agent-based approach can also re-duce network traffic. For devices where an agent cannot run, a proxy agent process might be able to provide data classification functionality.

� Faster, easier deployment because no software has to be installed on end nodes

� Less overhead for IT staff because they do not have to maintain agents

� No risk of agents dragging down system performance

� Better security, since there are no agents to be hacked

14Domain 6 | Choosing a Data Classification Solution

6.3 Supported platforms and data types

Last but not least, you need a DDC solution that supports as much of your data as possible. Be sure to consider whether you use Oracle, SQL Server, file storages, cloud storage and so on. Be sure to include both your structured and unstructured data. Not all solutions support structured data, and unstructured data is notoriously difficult to discover and parse prop-erly. Therefore, pay careful attention to not just the platforms listed as “supported” but the quality of the results that each solution delivers.

15

Building Effective Information Governance with Netwrix Data Classification

7.

7.1 Discover relevant data across multiple repositories faster

During eDiscovery and legal proceedings, you need to be able to collect all relevant files across your on-premises and cloud-based storages, such as Windows file servers, Share-Point and SQL Server. With Netwrix Data Classification, you can quickly find everything you need from a single platform.

7 | Building Effective Information Governance with Netwrix Data Classification

https://enterprise-my.sharepoint.com/sites/Documents/Project15245_Financial.xlsx

https://enterprise-my.sharepoint.com/sites/Projects/Project_15245/Release policy.docx

https://enterprise.com/Personal/AJakobs/Projects/Project_15245/Project map.pdf

add custom filter

Project Data

Project 15245 codename BarrelFind:

Filter by URL:

(100%)

(100%)

Financial dashboard codename Barrel March 2019 project15245 Estimated expenses Actual expenditures Average variances Total cost/month One-time cost Invoice number: 11544/7 Code: 7741

1

Showing 115 of 373 records Suggest clues for Search

Suggest Clues Add to Working Set Add to Negative Working Set Re-Index Re-Classify

Search

Road map guide Project 15245 - Codename Barrel Key challenges: high product market competition, non-ecological testing Impacts on working conditions: None Project owners: Adam Smith, Jack Malrow, Nina Cooper

2

(48%)

Project 15245 Release Policy Version 0.1 Date May 11, 2019 Author Mark Durclay Version Comments Draft Release policy goals: Predictability. Scope, delivery time and development status of new product versions should be visible to all stakeholders. Flexibility. Release process should allow for changing priorities and schedules.

3

16

7.2 Prioritize your data security efforts

To prioritize your security and governance efforts, you need to know where various types of information are located. Netwrix Data Classification empowers you to accurately identify the data that matters most to your business so you can manage and protect it properly.

Content DistributionThe “Content Distribution” report allows you to view the distribution of your content in several formats: grouping by source, grouping by taxonomy, or grouping by item. You can zoom in to a particular area of the chart by left-clicking in that area. Right-clicking will zoom back out again.

Group By: Source

Dashboard

Content Distribution

Recent Tagging

Index Analysis

Term Cloud

Classification Reports

Clue Building Reports

Document Reports

System Reports

Reports Queued Reports Plugins

Term

Source Filter: Include

Taxonomy

Exclude

GDPR

Generate

\\fs\share\customers

\\fs\share\customersDocuments: 518

https://enterprise.sharepoint.com

Server=SQL\Enterprise, Database=Accounting

PII

\\fs\share\public Server=SQL\Enterprise, Database=Accounting

https://enterprise.sharepoint.com

\\fs\share\internal

IP

https://enterprise.sharepoint.com

7 | Building Effective Information Governance with Netwrix Data Classification

17

Enforce information governance policies

It is important for organizations to not only formulate strong information governance pol-icies, but also make sure they are being obeyed company-wide. Since relying solely on each employee’s judgement is risky, it is essential to automate policy enforcement. With Netwrix Data Classification, you can automate many critical information governance pro-cesses, such as spotting sensitive files that surface in unsecure locations, moving them to a secure quarantine area and alerting the appropriate staff about the event.

Which content source(s)?

Choose a name for your workflow

Should this workflow be enabled on creation?

Which content source(s)?

What do you want to do?

Enabled

Action:Destination:

Maintain Folder Structure?: Move/Copy?:

If File Already Exists?: Redact Document?:

Migrate document to File System

\\fs\internal\quarantine\customer data

No

Move

Append Migration Date

No

Run this workflow against: Documents with Specific ClassificationsClassified as:• PII (All Terms)

Source Type:Sources:

SharePointhttp://sp.enterprise.com/sites/Sales

Disabled

Quarantine Workflow

What do you want to do? When do you want to do it? Summary

When do you want to do it?

?

7.3

7 | Building Effective Information Governance with Netwrix Data Classification

18

Revoke excessive permissions

Many organizations don’t know how much of their sensitive or business-critical data is ac-cessible by large groups of users and don’t have a quick way to find out. This gap in data access governance often leads to data leaks and compliance problems. With Netwrix Data Classification, you can easily create a workflow that will automatically remove permissions to sensitive data, including inherited permissions, from groups like Everyone.

WorkflowsWorkflow > \\fs1\Accounting > Update Permissions

GDPR > UK passport Classified

Conditions

Rule Conditions Edit

Rule 1

Enabled:

Workflows Plugins LogsConfigs

+

Showing 1 record(s)Copy | CSV | XLSX

i

Conditions Include Children Criteria

Rule Actions Addi

Action Parameters

Update Permissions RemoveAccessFrom=Everybody, GrantAccessTo=J.Smith, GrantAccessPermissionLevel=Full Control, RemoveInheritedPermissions=false

Edit | Delete

Edit Action

Action Type Update Permissions

Save Cancel

Remove Access From Everyone

Grant Access To J.Smith

Grant Access Permission Level Full Control

Remove Inherited Permissions

7.4

7 | Building Effective Information Governance with Netwrix Data Classification

19

Stay compliant with data retention mandates

Meet data retention requirements and improve your records management by automatical-ly finding specific types of records, such as tax returns, across your IT environment and en-forcing the required retention policies around them.

7.5

Which content source(s)?

Choose a name for your workflow

Should this workflow be enabled on creation?

Which content source(s)?

What do you want to do?

Enabled

Action:Destination:

Maintain Folder Structure?: Move/Copy?:

If File Already Exists?: Redact Document?:

Migrate document to File System

\\fs2\Archive\Tax Records

No

Move

Append Migration Date

No

Run this workflow against: Documents with Specific ClassificationsClassified as:• PII (All Terms)

Source Type:Sources:

File\\fs1\Finance\Tax Records

Disabled

Retention Workflow

What do you want to do? When do you want to do it? Summary

When do you want to do it?

?

7 | Building Effective Information Governance with Netwrix Data Classification

20

Reduce the total cost of storage by cleaning up low-value information

How much money and effort are you wasting on storing and maintaining duplicate, obsolete and trivial data? Netwrix Data Classification will automatically find and get rid of all low-quali-ty and low-value files, such as duplicate or old versions of documents, so users won’t have to slog through piles of clutter and you won’t have to constantly purchase more storage.

7.6

Near Duplicate Detection

Details near duplicate documents across the index. Near duplicates are detected as a background process, to enable the background processing simply enable the option “Near Duplicate Detection” within the Indexer Settings and rebuild the desired sources.

+ Show filters Generate

PageUrl Duplicate PageId

\\fs1\shared\Product Management\Release 5.4.docx

\\fs1\shared\Product Management\Release 5.4.docx

\\fs1\shared\Product Management\Various Documents\Release Notes.docx

Match Precision (%):

Minimum Text Length:

Maximum Text Length Difference:

95

100

20

Duplicate PageUrl

\\fs1\shared\Product Management\2019\Release 5.4 draft.docx

Relevancy

10075464

75546

75628

Text Length Difference

\\fs1\shared\Product Management\Various Documents\Release 5.4.docx

\\fs1\shared\HR\Other\Release Notes.docx

100

97

0.00

0.00

3.50

PageId

75450

75538

75617

7 | Building Effective Information Governance with Netwrix Data Classification

21

Improve the efficiency of data management technologies

Forcing your expensive data management and protection solutions to process all data, regard-less of its sensitivity or business value, bogs them down and drives up costs. Using the highly accurate classification tags written by Netwrix Data Classification, you can increase the effective-ness of your endpoint security, data loss prevention and data management solutions.

7.7

Recent TaggingThe “Recent Tagging” graph requires the “Auto-Classification Change Log” feature to be enabled (Config -> Classifier)

Url:

Dashboard

Content Distribution

Recent Tagging

Index Analysis

Term Cloud

Classification Reports

Clue Building Reports

Document Reports

System Reports

Reports Queued Reports Plugins

Taxonomy:

AMEX

Diners Club

Discover

JCB

Mastercard

UnionPay

VISA

0 5 10 15 20 25 30

No filter

All

Display Period: Past Week

Apply filters

7 | Building Effective Information Governance with Netwrix Data Classification

22

About the Author

About Netwrix

Earl is a 30-year veteran of the computer industry. His experience in IT training, marketing, technical evangelism and market analysis covers many areas, including networking, systems management, disaster recovery and business continuity, and application performance monitoring. Along the way, he has authored many eB-ooks, white papers and articles.

Netwrix is a software company that enables information security and governance profession-als to reclaim control over sensitive, regulated and business-critical data, regardless of where it resides. Over 10,000 organizations worldwide rely on Netwrix solutions to secure sensitive data, realize the full business value of enterprise content, pass compliance audits with less effort and expense, and increase the productivity of IT teams and knowledge workers.

Founded in 2006, Netwrix has earned more than 150 industry awards and been named to both the Inc. 5000 and Deloitte Technology Fast 500 lists of the fastest growing companies in the U.S.

For more information, visit www.netwrix.com.

netwrix.com/social

CORPORATE HEADQUARTER:

300 Spectrum Center Drive Suite 200 Irvine, CA 92618

PHONES: OTHER LOCATIONS: SOCIAL:

+33 9 75 18 11 19

+34 911 9826081-949-407-5125 Toll-free (USA): 888-638-9749

1-201-490-8840

+44 (0) 203 588 3023

565 Metro Place S, Suite 400Dublin, OH 43017

5 New Street SquareLondon EC4A 3TW

+49 711 899 89 187

+31 858 887 804

+852 5808 1306

+46 8 525 03487

+39 02 947 53539

+41 43 508 3472

France:

Spain:

Germany:

Netherlands:

Hong Kong:

Sweden:

Italy:

Switzerland:

Earl Follis