0% found this document useful (0 votes)

380 views8 pages

How To Take Control of Your Data

Proactive technology for classification creates benefits downstream from the data in a litigation event and upstream for information governance of the enterprise. Article discusses how stakeholders can apply classification to their objectives, the types of questions that an e-Discovery provider may ask of you, and one approach to management of poor quality data.

Uploaded by

BenjaminMarks2013

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

380 views8 pages

How To Take Control of Your Data

Uploaded by

BenjaminMarks2013

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Database

VOl.2NO.8

eDiscovery Compendium
E-DISCOVERY AND E-DISCLOSURE: SAME DIFFERENCE? FORENSICS & E-DISCOVERY COMMUNICATION IS THE KEY SYMBIOSIS CORPORATE E-DISCOVERY SUCCESS STARTS WITH INFORMATION GOVERNANCE EDISCOVERY COLLECTIONS THERES MORE THAN ONE WAY TO COPY A FILE
Issue 08/2013 (11) July

HOW TO TAKE CONTROL OF YOUR DATA

INSTEAD OF WAITING FOR THE NEXT TRIGGERING EVENT
by Benjamin Marks and Brent Stanley

Come e-Discovery counsel throughout the land and please dont ignore what you cant understand. During a time of political and social upheaval in 1965, American songwriter Bob Dylan penned The Times They Are A Changin. In our community, change continues to occur as data volumes grow.

What you will learn:

The benefits of classification How to manage ugly data and atypical data populations Stakeholder questions in consideration of classification (see Figure 2)

What you should know:

The difference between reactive e-Discovery and proactive Information Governance Not all managed reviews are created equal Technology Assisted Review requires subject matter expertise for effective deployment

he importance of data classification by relevant business purpose,prior to processing cannot be understated or misunderstood. Proactive technology choices such as classification create numerous benefits downstream during a litigation event, as well as upstream to manage information governance across an enterprise. Poor quality data might not be searchable but that must not diminish its relevance or the need to understand its content. Whereas predictive coding employs technology that relies upon the search ability of good quality text, what is your workflow for the boxes of paper and the unsearchable electronic files created from third generation scans?

Big Data is growing beyond your command, the old methods are rapidly aging In 2013, unstructured data continues to exponentially increase in volume. For the longest time, our industry has followed the Four Ps People, Process, Platform, and Protocol, of the decidedly reactive Electronic Discovery Reference Model. Clients relate that their chief problems tend to revolve around productivity, accuracy, risk mitigation, defensibility of process, and these all have an impact on the bottom line their legal spend. However, the time has come to understand a Fifth P PROACTIVE. We now know that not all workflows are equal. An abundance of interest

INTRODUCTION

HOW TO TAKE CONTROL OF YOUR DATA

in enterprise-wide Business Process Management (BPM) cost-saving measures is driving solutions towards creation and deployment of a proactive workflow where classification occurs prior to the managed review of documents. Poor quality data is rarely reviewed or effectively searched prior to, or in conjunction with Rule 26 conferences. Case studies and interactive questions are used to illustrate the concepts in this article. Information Lifecycle Management with a foundation in the EDRM is a multi-step process where data is Forensically Collected, Processed and Analyzed, Hosted, and then Reviewed and Produced, according to a very specific protocol and set of instructions. According to the 2012 Rand Report, Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery, Collections and Processing account for about 27% of the litigation spend, while the Review component is roughly 73% of the litigation spend. We know that strategic decisions enacted upstream will lead to a proactive and cost-effective workflow downstream. Classification is best applied prior to the processing and analytics step, where the on-average cost of 5 cents per document spent proactively upon classification can offset 50 cents per document spent reactively during a traditional linear managed review. In proactive classification workflows, a cost savings accrues leading to greater predictability of budget for both time and money, so that the 73% of spend that occurs in review may carry higher value than a mere linear review. As the present now will later be past Clients shared prior horror stories of what happened on their last managed review such as performing quality control and finding a high amount of material error in the 1stlevel review. They were chagrined about the time the vendor promised that the review would be finished in four weeks, but had to add twenty reviewers and work overtime in the fourth week; not unexpectedly, the project bill was over budget. Of equal frustration were the occurences of last minute productions delivered to associates with no time to spare for around-theclock deposition preparation, over the weekend again. Not surprisingly, these problems occur with greater frequency in a reactive workflow, or where the vendor did not lay the foundation for solutions and defensibility with its failure to ask the right questions. These questions and a clients answers carry a dual purpose: (1) to assist the scoping of the project, and (2) alignment of value to client needs for the purchase of e-Discovery services. (see List 1).
www.eForensicsMag.com

BODY

Client 1 is a Fortune 500 vertically integrated company that relies on several managed review providers and outsourced early case assessment (ECA) tools; ostensibly, they made purchasing decisions based on relationships and price, rather than on an underlying awareness of their needs or the changing technologies in the marketplace. The client shared that they were concerned about the high cost of 1stlevel document review. In an effort to identify cost savings, we offered to re-review their data from a recent case to illustrate how machine learning via a classification tool could provide improved client knowledge about their data, prior to processing and especially prior to managed review, so that intelligent staffing choices could be made for a future managed review.
What is the subject matter? Similar subject matters may engender similar protocols for review Case matters profiles may be replicated for the client Demonstrable expertise from measurable historic results What is the approximate volume of documents? Working assumptions are confirmed Greater predictability for duration of review Which pricing model to apply, hourly or fixed price per document What are the file types? Impact to timing and workflow requirements Historic file type management on this type of case Identify any special types of skilled reviewers needed What are the average pages per document? What are the average pages per GB? What are the average documents per GB? Collectively, these 3 questions assist identification of an atypical document population Such an identification can alert us to special concerns for staffing before the review begins Comparison against historic workflow profiles for anomalies that may impact timing and other services such as privilege log creation or redaction How many custodians? Comparison for historic hit rate Prioritization for workflow and best practices Staffing needs How many issue tags? Historic responsiveness rates compared to current case Best practices favor 10 tags or fewer Discussion of potential areas of data uncertainty prior to review so that data may be strategically batched in mitigation of future costly rereview that results from client protocol change What drives your purchasing decision to choose one provider over another? Is there a feature or aspect of your current service that you consider important? Why?

Considering new technology

PROACTIVE APPROACH TO COST SAVINGS

List 1: Scoping Questions

We classified data for its relevant business purpose as a precursor to creating a seed set for predictive analytics. We compared the effectiveness of the tool to an existing in-house product. We identified best practices for seed set creation protocols, and can share some lessons learned about the process that will benefit future clients.

For testing, we utilized a sample batch of data consisting of a mix of Excel, Word, Power Point, Adobe PDF, and MS Outlook Email. The test data set was initially provided and we then proceeded to analyze the data, create categories for classification, identify a seed set, and proceed with an automated classification process on the remainder. The results were provided to us and we loaded them into a Relativity database for testing. The deliverable included a list of identified categories, a list of documents used in their seed set, and a load file listing all documents and their corresponding category. Once all categorization sets were completed, we built saved searches to identify discrepancies. We employed a human element to validate the classifications performed and to create a blind seed set for comparison. The subject matter expertise of the engagement engineer plays a factor in the way that seed sets are created. The new classification technology was able to classify a higher percentage of documents and illustrated better optimization of multiple file types than any of the in-house categorization sets created by incumbent products. The ability to classify on a relevant business purpose with a robust file identification engine is perhaps one of the largest differentiators between competing technologies. The human intelligence married to the artificial intelligence of machine learning is an important step in the iterative process of seed set creation. Subject matter knowledge differs from person to person based on understanding of the type of case, the case in point, familiarity with the use of technology, and professional experience / exposure to documents and concepts that clients provide for production. The Blind Classification set created by the subject matter expert was found to match favorably (72%) with the machine learning technology performed by our tool.

Please consider a new change in your data workflow

higher quality of prioritized data, different from a linear review. Define categories and identify where overlap occurs. It is a prioritized classification and po tentially responsive; certain categories may require a 2ndlook as part of the iterative process, prior to managed review. Client should be encouraged to provide list of responsive terms, privilege names, during custodial collection, for the purpose of data mapping and classification, prior to the project kickoff. A Potential Privilege filter can be applied based upon list of counsel names and mitigate the impact that occurs from inconsistent coding in a traditional linear review. On a case-by-case basis, confirm with your vendor who from their pool of candidates and subject matter experts will be provided for supervised machine learning and seed set creation.

LESSONS LEARNED

If your time to you is worth savin To summarize, proactive pre-processing classification takes a large corpus of unstructured data and organizes it around a central business purpose or theme. This categorization prioritizes and in turn reduces the amount of documents that undergo a traditional linear first pass review for responsiveness. A reduced volume of documents leads to a reduced labor cost where less reviewers are needed to accomplish the same task perhaps in fewer hours, days, or weeks. The potentially responsive documents are classified and prioritized around the relevant purpose and the potentially non-responsive documents are set aside for later review, if necessary. No coding decisions to tag have been made at this stage. Neither have non-responsive documents had to be processed in order to determine that they do not meet threshold requirements for responsive production. Through the use of proactive classification, we have transformed managed review into an engineered review. Its a more efficiently staffed project. We train and qualify our review team on the classification of the data and the alignment of who, what, when and how. Everything that we learned from the classification process is a point of knowledge for the case and this is conveyed through the delivery of a production binder documenting every step taken for defensibility. Better trained reviewers make less material error because the training on the quality control process is very robust. Productive reviewers complete batches faster because they are not distracted by uncategorized linear data.

EVOLUTION OF ENGINEERED REVIEW

PROCEDURAL BEST PRACTICES

Process source data to expose actual (as opposed to stated) file types, system files, duplicates and near-duplicates. Classification on multiple relevant business purposes helps understand data and leads to a
82

HOW TO TAKE CONTROL OF YOUR DATA

Classification organizes the data and themes emerge. Trends and occurrences are readily visible as patterns of behavior: Who was talking to whom, about what, how and when did this occur? Every month for 9 months, Smith and Jones, had a meeting, and exchanged 3 emails with 6 attachments. There were always 3 spreadsheets, 1 HR Word document related to goal measurement, a PowerPoint presentation for the board of directors, and an agenda. There were multiple drafts of the PowerPoint. There were requests for legal advice that made some of the documents, potentially privileged. Lawyers and legal domain names, identified in advance through the use of classification tools, and potentially privileged were set aside for the Privilege Review team instead of having to be reviewed twice, at the risk of making an inconsistent call. Classification identifies frequency of events, conversations, and third parties to a conversation. Then one day, in the 10th month, Smith and Jones introduced Davis, a competitor, to the mix of their regularly patterned behavior. All of a sudden, Smith and Jones

Classification Use Case

were scheduling a meeting with Davis to discuss fixing a price. Consider the following questions. Could you have found that in a traditional linear review? When would you have found it? Would you have noticed the frequent pattern of behavior for 9 months and then spotted the anomaly, Davis, in the 10thmonth? What if you had different reviewers on the two batches, a distinctive likelihood? In a classification system, you could find it with frequency reports, and then using the iterative process of machine learning, train the machine to find other documents like that smoking gun whose existence was previously unknown. Data can be batched specific to this particular incident, before reviewers are in their seats and classification can provide valuable case knowledge in the instance where you arent necessarily aware of what you did not know. Oneby-product for the corporation that engages in classification is the understanding of their data in terms of knowledge management. Classification can deliver reports on frequency of nouns and verbsfor both defensibility of the process undertaken (for use in Rule 26 meet and confers), as well for the identification of the next triggering event. In this manner the wheel is not recreated each and every time there is a triggering event.

Applications include: Classification Services for Information Governance Due Diligence and Audit Support Data Mining on Physical Records aka Whats In the Box? Records Validation and Verification

Services include: Subject matter expertise (SME) in machine learning and data extraction Classification training and certification to clients and partners

Products include: Haystac RetenGine which processes enterprise data and Haystac Web which processes data on the internet

Contact: +1 781-820-7616 Email: [email protected] On the web: http://www.haystac.com To read more from Haystac, please visit http://www.haystac.com/whitepapers

Rather, they are tuned in to the proactive prioritization of the classifications. Thus, they are more likely to spot outliers and departures in behavior patterns, analyze sentiment in a message, and spot differences not readily found in a traditional linear review. (see Frame: Classification Use Case)

Client 2 is a Fortune 100 commercial bank. Because we have a very deep understanding of this banks litigation matters, we undertook three custom tasks that would be considered atypical by any vendor standard in the e-discovery industry. While many providers would shy away from undertaking such projects, these were the perfect test cases to employ technology, identify efficiencies, and share results both with our banking client, and other companies who face the same challenges (Frame: Why is data classification a good idea for your organization?). Ugly data is poor quality data that originated as a paper document at some pointin its life. One easy to digest example is the process of contract execution where a contract was printed and signed, then scanned and sent to a counterparty or additional signatory for signing, where it wasre-scanned and returned. That was at least three generations off the original. Depending upon the quality of the printout and scan, there may be some loss of fidelity during the OCR conversion from native file to TIFF. Recent work for clients in the oil and gas industry required the cleanup of a fax document for the production of a maintenance report related to a well (See Figures 1, 2).

UGLY DATA AND ATYPICAL DOCUMENT POPULATIONS

WHAT IS UGLY DATA?

Transfer of assets and collection of work product across several vertical markets has resulted in the records for such being compiled into a single PDF, usually with no index. This condition is prevalent in the oil and gas and mortgage industries, where the records associated with the asset are created as these large PDFs. The holder of these PDFs is forced to reconstruct the original document collection in order to determine the presence of critical records and/or recreate a database of key attributes contained within the documents. In addition, the quality of the OCR text is usually poor, severely limiting the usefulness of search-based interrogation. Manually splitting these PDFs into their original documents is an expensive and time consuming process. We were able to train on a seed set of documents and automatically split 21 Loan Files into 1900 PDFs, the original document set, accurately identifying the logical document breaks and auto-classifying each document to high levels of accuracy. New document naming conventions are auto-generated, usually based on appending the original file name with the page range of the new document. (see Figure 3) The client provided a list of 13 categories by which to place documents. For comparison, we had our Haystac technology go head to head with human reviewers. The technology was able to categorize all of the documents and left fewer documents in the OTHER category, than the off-shore human review team. The advantage of Haystacs machine-based process is quicker recognition of error patterns and their correction, thus eliminating the inherent variability of human judgment. The process can be applied to millions of pages of PDFs and produce results in a fraction of the time of its manual counterpart.
Future litigation. Classification provides a material benefit to CSuite stakeholders. Reduction of labor cost occurs at the most variable portion of a managed review. Improved productivity to reach higher priority data where strategic decisions are made. Classification enables greater accuracy to allow the production of data sooner.

Large PDF Splitting and Classification

Classification equals preparedness for all stakeholders. Who is involved in your enterprise with making these decisions? Do you have any of these concerns? Indexing and remediation of legacy data for storage. What are the new record-keeping requirements under Dodd-Frank? Classification can reduce annual storage costs at the terabyte and petabyte level. Defensible deletion reduces enterprise risk. Are we holding data for too long? How will we meet the new statutory regimes for re porting under Dodd-Frank? Classification can establish cost-effective predictability for compliance and mitigate costs found in a risk profile. Periodic M&A events that requires great due diligence. Regulatory Compliance.

Why is data classification a good idea for your organization?

Legal and General Counsel

Records Management

Corporate Knowledge Base

Risk Management, Audit, and Compliance

A repository allows clear insight into the language used to discuss common business events:

C-Suite Management

Who was talking to whom, When these conversations were occurring, and Identification of a pattern of expected behavior, thus enabling the visibility ofoutliers, anomalies, and departures from the pattern in essence needles in a haystack. Classification enables the creation of a corporate repository and promotes reusability of data, so that you no longer have to recreate the wheel.

HOW TO TAKE CONTROL OF YOUR DATA Auto Extraction of Text for Logical Document Determination
documents and is supplemented with actual document headers gleaned by sampling client data. Image processing extracts the title fragment and algorithmic processing determines the most probable title match. The user interface contains an editor which allows the user to view machine results, enter new headers and correct errors. On this task, we extracted document titles that were meaningful to the useful categorization of the poor quality OCRed documents.

Poor quality text documents can constitute a significant percentage of stored documents. Scanned documents are typically stored as TIFF or PDF files on file servers and email archives and are usually poorly indexed, making them hard to find using enterprise search engines. In addition, important records stored in boxes and files are also poorly indexed at the box or file level, making the box or file contents blind to the enterprise. Manually indexing these documents is resource intensive and costly, yet locating important records is very meaningful to satisfy audit, investigatory and document control objectives, as well as meeting information governance requirements. Document titles are often, a key indicator of the purpose of the document, so that accurately and cost effectively determining the title means document importance as a record can be determined. Determining the title allows classifying the document to a business purpose using database mapping. Using a soft dictionary-based approach to identifying document titles, a dictionary has been compiled from common business function-based

There were 68 fields of entry on a custom reporting document for which items such as loan #, amounts,

Auto-Extraction of Text for Reporting Purposes

Figure 2. Data Cleaned Up

Figure 1. Poor Quality Data

www.eForensicsMag.com 85

A fully defensible engineered review process mitigates a clients risk profile. Reduction in legal spend through a more efficient engineered review where fewer attorneys are needed for a 1st level review is in essence, doing more with less. Corporate Knowledge Base created which signifies an advance in the reuse of data. Accuracy and robust quality control protocols enable the direction and allocation of litigation spend towards higher value legal functions, sooner. Oh the times, they are a changin codes, dates, borrower names, mortgage lenders, title insurance, and other information was required. Our auto-extraction technology was able to accurately populate the data on an Excel spreadsheet in response to the government request for production.

Custom Solutions Yield Workflow Benefits

Accurately Identify Important Records without Manual Brute Force Processing Scale Classification to Large Document Collections Eliminate Unnecessary Documents from Storage Improve the Odds of Finding Critical Records Increase the Speed of Getting Results The slow ones now will later be fast Prospects and clients will one day realize that the lowest vendor price does not always equate to the best value for their litigation spend. On the review side, value can be added where Multi-Class Classification occurs prior to processing and Subject Matter Expertise is applied as the human complement to machine learning. Value is enhanced where the proactive use of technology removes inefficiencies leadingto improved knowledge management and ultimately a higher quality litigation spend. Proposed changes to the Federal Rules of Civil Procedure (proportionality amendments) will seek to reduce time delays and extraordinary costs associated with e-Discovery where such costs outweigh the utility of the task undertaken; in this regard, classification applied in a proactive workflow would meet the goal of proportionality because better organization of data upstream can save downstream costs. The benefits of proactive classification at the outset of an engineered review are multiple: Increased productivity on a 1st level review adds value to the predictability of your litigation budget for time and money.
86

CONCLUSION

Benjamin S. Marks is a consultant on eDiscovery and Information Governance initiatives. Most recently, he assisted development of a document review center in Charlotte, North Carolina, and new product introduction for an e-Discovery service provider. An entrepreneurial strategicminded lawyer with a business operations background, Bens prior work on staffing managed reviews affords him the insight to identify subject matter expertise for teams, develop proactive workflows, and assemble responses to RFPs. Prior to law school, Ben was the founder of Eco Specialties and Design, an environmentally themed promotions company. Today, when hes not building seed sets or reading about Dodd-Franks impact on enterprise risk management, Ben follows Orioles baseball, attends live music events, enjoys cooking, and runs with his puggle in Baltimore, Maryland. He holds a J.D. and Environmental Certificate from Pace University School of Law.

About the Author

Interfacing Techniques Topic 4
No ratings yet
Interfacing Techniques Topic 4
23 pages
Analytics and Big Data for Accountants
From Everand
Analytics and Big Data for Accountants
Jim Lindell
No ratings yet
The Data-Confident Internal Auditor: A Practical, Step-by-Step Guide
From Everand
The Data-Confident Internal Auditor: A Practical, Step-by-Step Guide
Yusuf Moolla
No ratings yet
Value Optimization for Project and Performance Management
From Everand
Value Optimization for Project and Performance Management
Robert B. Stewart
No ratings yet
Computer Science Revision Notes Paper 1
No ratings yet
Computer Science Revision Notes Paper 1
30 pages
Tamil Technical Computer Dictionary
100% (31)
Tamil Technical Computer Dictionary
129 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Automated Network Technology: The Changing Boundaries of Expert Systems
From Everand
Automated Network Technology: The Changing Boundaries of Expert Systems
Carl P. Catalano Ph.D.
No ratings yet
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
From Everand
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
Manish Soni
No ratings yet
Intelligent Document Capture with Ephesoft
From Everand
Intelligent Document Capture with Ephesoft
Pat Myers
No ratings yet
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
From Everand
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
Jamie Murphy
No ratings yet
Cost Estimation in Agile Software Development: Utilizing Functional Size Measurement Methods
From Everand
Cost Estimation in Agile Software Development: Utilizing Functional Size Measurement Methods
Stefan Luckhaus
No ratings yet
Enabling World-Class Decisions: The Executive’s Guide to Understanding & Deploying Modern Corporate Performance Management Solutions
From Everand
Enabling World-Class Decisions: The Executive’s Guide to Understanding & Deploying Modern Corporate Performance Management Solutions
Corey Barak
No ratings yet
Enabling World-Class Decisions for Banks and Credit Unions: Making Dollars and Sense of Your Data
From Everand
Enabling World-Class Decisions for Banks and Credit Unions: Making Dollars and Sense of Your Data
Corey Barak
No ratings yet
Data Entry Operator: Skills, Software, Career Tips, and Interview Q&A
From Everand
Data Entry Operator: Skills, Software, Career Tips, and Interview Q&A
Sumitra Kumari
No ratings yet
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Intelligent Document Processing (IDP): A Comprehensive Guide to Streamlining Document Management
From Everand
Intelligent Document Processing (IDP): A Comprehensive Guide to Streamlining Document Management
Rick Spair
No ratings yet
MCS-034: Software Engineering
From Everand
MCS-034: Software Engineering
Dr. DK Sukhani
No ratings yet
Implementation of a Central Electronic Mail & Filing Structure
From Everand
Implementation of a Central Electronic Mail & Filing Structure
Patapios Tranakas
No ratings yet
Data Breach A Complete Guide - 2020 Edition
From Everand
Data Breach A Complete Guide - 2020 Edition
Gerardus Blokdyk
No ratings yet
Enabling World-Class Decisions for Asia Pacific (APAC): The Executive’s Guide to Understanding & Deploying Modern Corporate Performance Management Solutions for Asia Pacific
From Everand
Enabling World-Class Decisions for Asia Pacific (APAC): The Executive’s Guide to Understanding & Deploying Modern Corporate Performance Management Solutions for Asia Pacific
Corey Barak
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
Assessing IT Projects to Ensure Successful Outcomes
From Everand
Assessing IT Projects to Ensure Successful Outcomes
Kerry Wills
No ratings yet
Document and Knowledge Management Interrelationships
From Everand
Document and Knowledge Management Interrelationships
A. Afritopic
4.5/5 (2)
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
From Everand
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
Zemelak Goraga
No ratings yet
Getting Data Science Done: Managing Projects From Ideas to Products
From Everand
Getting Data Science Done: Managing Projects From Ideas to Products
John Hawkins
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Data file Standard Requirements
From Everand
Data file Standard Requirements
Gerardus Blokdyk
No ratings yet
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Touchpad Information Technology Class 10
From Everand
Touchpad Information Technology Class 10
Sanjay Jain
5/5 (1)
Protocol data unit A Clear and Concise Reference
From Everand
Protocol data unit A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Making Big Data Work for Your Business: A guide to effective Big Data analytics
From Everand
Making Big Data Work for Your Business: A guide to effective Big Data analytics
Sudhi Sinha
No ratings yet
Selection Management for Systems and Services
From Everand
Selection Management for Systems and Services
Jacquie Wakeford
No ratings yet
Network File System Standard Requirements
From Everand
Network File System Standard Requirements
Gerardus Blokdyk
No ratings yet
Data Conversion: Calculating the Monetary Benefits
From Everand
Data Conversion: Calculating the Monetary Benefits
Patricia Pulliam Phillips
No ratings yet
Business Project Plan Example
No ratings yet
Business Project Plan Example
26 pages
Quality Management System Concept
From Everand
Quality Management System Concept
James Hutchins
3/5 (1)
Big Data: Understanding How Data Powers Big Business
From Everand
Big Data: Understanding How Data Powers Big Business
Bill Schmarzo
2/5 (1)
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
Value Nets (Review and Analysis of Bovet and Martha's Book)
From Everand
Value Nets (Review and Analysis of Bovet and Martha's Book)
BusinessNews Publishing
No ratings yet
From Data To Decisions: Driving Performance in the Age of Analytics
From Everand
From Data To Decisions: Driving Performance in the Age of Analytics
Babatunde Yusuf
No ratings yet
Business Analysis : Learn in 24 Hours
From Everand
Business Analysis : Learn in 24 Hours
Alex Nordeen
No ratings yet
Software Testing Interview Questions You'll Most Likely Be Asked
From Everand
Software Testing Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations
From Everand
Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations
Nicole Forsgren, PhD
4/5 (41)
Spatial data infrastructure Second Edition
From Everand
Spatial data infrastructure Second Edition
Gerardus Blokdyk
No ratings yet
CHANGE: Planned & Unplanned
From Everand
CHANGE: Planned & Unplanned
Gerald M. Weinberg
No ratings yet
Data corruption Second Edition
From Everand
Data corruption Second Edition
Gerardus Blokdyk
No ratings yet
Competitor Analysis:Working Paper
From Everand
Competitor Analysis:Working Paper
Jacob Varghese
No ratings yet
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
Data domain A Clear and Concise Reference
From Everand
Data domain A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
ICT infrastructure Third Edition
From Everand
ICT infrastructure Third Edition
Gerardus Blokdyk
No ratings yet
Implementing Computer Systems for Small & Medium Businesses
From Everand
Implementing Computer Systems for Small & Medium Businesses
Randy Rolleman
No ratings yet
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
Big Data Information Management for Government Complete Self-Assessment Guide
From Everand
Big Data Information Management for Government Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Analysis Phase: The Business Leader's Playbook of Software Development, #2
From Everand
Analysis Phase: The Business Leader's Playbook of Software Development, #2
Michael Afar
No ratings yet
Awesome by Design
From Everand
Awesome by Design
Steven M. Price
No ratings yet
Computer data storage Second Edition
From Everand
Computer data storage Second Edition
Gerardus Blokdyk
No ratings yet
Source data Complete Self-Assessment Guide
From Everand
Source data Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Agile Software Development: Incremental-Based Work Benefits Developers and Customers
From Everand
Agile Software Development: Incremental-Based Work Benefits Developers and Customers
Anthony Baah
No ratings yet
QRS
No ratings yet
QRS
31 pages
PEGA Tester Profiles Capgemini 12182023
No ratings yet
PEGA Tester Profiles Capgemini 12182023
7 pages
Software Development Life Cycle (SDLC) : Asma Sajid
No ratings yet
Software Development Life Cycle (SDLC) : Asma Sajid
20 pages
SAP Note 1034354 - FAGLL03 Display of Offsetting Account Information
No ratings yet
SAP Note 1034354 - FAGLL03 Display of Offsetting Account Information
2 pages
Open Elective III & IV List 2021-22
No ratings yet
Open Elective III & IV List 2021-22
6 pages
Контент лог
No ratings yet
Контент лог
28 pages
Types of Network
No ratings yet
Types of Network
3 pages
Configuring The Cisco ISE Appliances
No ratings yet
Configuring The Cisco ISE Appliances
18 pages
Faculty End Sem 2024 Practical Routine
No ratings yet
Faculty End Sem 2024 Practical Routine
2 pages
Personal and Social Domain
No ratings yet
Personal and Social Domain
2 pages
Advanced Supply Chain Management USA - Streamline Operations
No ratings yet
Advanced Supply Chain Management USA - Streamline Operations
3 pages
Resume Template For AI
No ratings yet
Resume Template For AI
4 pages
Sop Software Problem Resolution
No ratings yet
Sop Software Problem Resolution
3 pages
Cloud Computing
No ratings yet
Cloud Computing
2 pages
Testing
No ratings yet
Testing
17 pages
Ace12-User Guide
No ratings yet
Ace12-User Guide
3,092 pages
Seminar Report Mess Management
No ratings yet
Seminar Report Mess Management
13 pages
Seven Building Blocks of Information Technology: The Wares That Links The Global Community
No ratings yet
Seven Building Blocks of Information Technology: The Wares That Links The Global Community
11 pages
Notice 03 Jul 2023 Phase 3 Jan 23 25 2024
No ratings yet
Notice 03 Jul 2023 Phase 3 Jan 23 25 2024
2 pages
Statement For A/c XXXXXXXXX2500 For The Period 23-Apr-2022 To 22-Apr-2023
No ratings yet
Statement For A/c XXXXXXXXX2500 For The Period 23-Apr-2022 To 22-Apr-2023
3 pages
How-To-Integrate-Visual-Paradigm-With-Netbeans-Đã G P
No ratings yet
How-To-Integrate-Visual-Paradigm-With-Netbeans-Đã G P
14 pages
Talakunchi - Scoping Questionnaires (Combined)
No ratings yet
Talakunchi - Scoping Questionnaires (Combined)
7 pages
Degree/ Certificate Institution Percentage/Cgpa Year: Acmegrade Internship
No ratings yet
Degree/ Certificate Institution Percentage/Cgpa Year: Acmegrade Internship
1 page
Unit - V Java
No ratings yet
Unit - V Java
15 pages
CEF
No ratings yet
CEF
15 pages
1.small Vs Large Scale
No ratings yet
1.small Vs Large Scale
3 pages
SAR - Customer Deliverable
No ratings yet
SAR - Customer Deliverable
7 pages
Azure Data Factory
No ratings yet
Azure Data Factory
47 pages

How To Take Control of Your Data

Uploaded by

How To Take Control of Your Data

Uploaded by

Database

HOW TO TAKE CONTROL OF YOUR DATA

What you will learn:

What you should know:

HOW TO TAKE CONTROL OF YOUR DATA

Considering new technology

PROACTIVE APPROACH TO COST SAVINGS

List 1: Scoping Questions

Please consider a new change in your data workflow

EVOLUTION OF ENGINEERED REVIEW

PROCEDURAL BEST PRACTICES

HOW TO TAKE CONTROL OF YOUR DATA

Classification Use Case

UGLY DATA AND ATYPICAL DOCUMENT POPULATIONS

WHAT IS UGLY DATA?

Large PDF Splitting and Classification

Why is data classification a good idea for your organization?

Legal and General Counsel

Corporate Knowledge Base

Risk Management, Audit, and Compliance

Auto-Extraction of Text for Reporting Purposes

Figure 2. Data Cleaned Up

Figure 1. Poor Quality Data

Custom Solutions Yield Workflow Benefits

About the Author

You might also like