David Rostov, Chief Executive Officer of Lighthouse Document Technologies, has 20 years of experience in senior leadership positions and managing MORE...



Early Case Assessment

Electronic Discovery Cost

Electronic Discovery Processing

Electronic Discovery Strategy

Electronic Discovery Vendor Selection

Hosted Legal Review




Enter keywords:


ADD THIS BLOG to your feeds or subscribe by email using the form below.


Ad Law Access - Kelley Drye & Warren Law Firm

Ad Law By Request - Reed Smith Law Firm

Antitrust Law Blog - Sheppard Mullin Law Firm

Art + Law + Blog - Bryan Cave Law Firm

Art Law Gallery - Sheppard Mullin Law Firm

Art Law Lawyer & Attorney - Fox Rothschild Law Firm

Australian Trade Marks Law Blog - Nicholas Weston Law Firm

Broadcast Law Blog - Davis Wright Tremaine Law Firm

CHIP Intellectual Property Law Blog - Perkins Coie Law Firm

California Biotech Law Blog - Prinz Law Firm

California Defamation Law Blog - Adrianos Facchetti Law Firm

Chicago IP Litigation Blog - R. David Donoghue

Communications Law Blog - Fletcher Heald & Hildreth Law Firm

Computer Forensics and E-Discovery Blog - Electronic Evidence Retrieval LLC

Contemporary Intellectual Property, Licensing & Information Law - Raymond Nimmer

Contingent Fee Business Litigation Blog - McClanahan Myers Espey Law Firm

Corporate Securities Law Blog - Sheppard Mullin Law Firm

Covering your Ads - Sheppard Mullin Law Firm

Creativity Lifecycle - MacPherson Leslie & Tyerman Law Firm

Delaware Patent Litigation Report - Morris James Law Firm

Morgan & Morgan - Personal Injury Litigation Forthepeople.com


Relativity Pivot as a Review Quality Control Measure

The Power of Relativity Pivot on Case Strategy

The Power of Relativity Pivot in Early Case Assessment

Collections: Top 5 Questions

Requesting Data: More is Better

Electronic discovery insight & commentary from David Rostov at Lighthouse Document Technology, offering updates related to e-discovery, EDD, Meta data, forensic & data collections, data recovery, ESI consulting, litigation support, document review, paper discovery & producing electronic data.

Electronic Discovery Made Easy

Requesting Data: More is Better


We were recently asked for our thoughts on how best to strategically sift through large data sets when you are on the receiving end of the data. Here is a summary of what we recommended.

Meet and Confer. The goal is to get the other side to produce as much information as possible tailored to work best with your case strategy and review tools.

Optimal Production on the Receiving Side. We recommend requesting processed natives – data that the other side converted into a litigation data base. The data would include extracted text, OCR, metadata and all images. The files would be numbered with document IDs so that both sides could keep track of the documents. Benefits include: lower cost, ease of search, ability to cluster documents and tracking discussions.


TIFF Production with Metadata. If the other side will only provide TIFF images, then we would recommend that you require that they provide as much metadata as possible. At a minimum, you will want to get the extracted text, OCR and all major metadata fields. Here are some ways that you can improve the efficiency of your review.


TIFF Production with Limited Metadata. If the data provides are TIFFs with limited metadata (e.g. existing discovery, hard copy documents, etc.), then the focus on increasing review work flow. Here are a few ways to improve reviewer:

To summarize, the more metadata that the producing party provides, the better. Therefore, up-front ask for as much meta-data as possible with specificity around the technical format required.

TAGS:  Electronic Discovery Strategy, TIFF, data, eDiscovery, efficiency, native, near, productions, receiving, requesting, review, strategy

De-Duplication -- Different Tools, Different Results


If two emails are identical, shouldn’t they be considered duplicates?

Unfortunately in eDiscovery it is not quite so simple. The industry standard is to calculate an MD5 hash value for all emails in a population and then identify the duplicate emails (this is referred to as de-duping). MD5 hash value is the output of a complex mathematical algorithm; it provides a way to identify each unique document. Ralph Losey has written some very thoughtful commentary on hash values. He makes a very interesting case for using hash values as the replacement for Bates numbering; the 21st century version of Bates numbering.

The issue/challenge is that each of the major eDiscovery software tools uses its own proprietary definition of the inputs used in calculating the hash values. In the hash value world, even a very small difference means that two documents that are truly identical can be considered distinct. As a result, this leads to a certain set of documents being reviewed and/or produced more than once. The table below provides a summary of the inputs used to calculate the hash values from three leading tools: Clearwell, LAW and IPRO. As you can see, they are each different.

TAGS:  Bates, Bates Numbering, Clearwell, De-Dup, De-Duplication, Electronic Discovery Processing, Hash Values, IPRO, LAW, MD5, Ralph Losey




















Subject of the email




Email date (sent date)




Body content




Attachment Names










Yes indicates it is included in the hash computation.


No indicates that it is not included in the hash computation.


IPRO hash methodology can be customized based on the settings outlined above.

As a recent real world example, we worked on an eDiscovery project where the custodian sent out an email to eight people within his company. By any reasonable standard, this means that there were eight exact duplicates of this email in the population set. However, the software tool used to process this data categorized this email as being four different emails. This was due to the fact that the company had various internal email servers (a fairly common occurrence in larger corporations) and each time the email was handed off to a different internal server, it placed a slightly different time in one of the metadata fields.


Although each software tool calculates the hash values slightly differently, this does not necessarily mean that one tool is better or worse than another or that one is inherently more accurate. If hash values were to become the Bates stamp of the 21st century, the electronic discovery industry could benefit from a standard method of calculating hash values. Absent a standard, it is important to be aware of this issue in case you run across it.

Why are So Many Email Collections Corrupted


Many email collections are done improperly and produce corrupted files. Unless properly repaired, corrupted email files cannot be processed for litigation. The most common email collection problem is from Microsoft Exchange Server collections (.PST files).  Improperly collected exchange data adds significant time and cost to the eDiscovery process. It also introduces an element of risk in terms of the overall integrity of the evidence.

Microsoft Outlook saves all email files in a .PST file format. Think of the PST as an expanding container file. For most custodians, all of their email resides in a few PST files.

Often email collections are performed by internal IT personnel. Usually email collections are done using the Microsoft Exchange Mailbox Merge Program (ExMerge.exe). This program enables a network administrator to extract data from mailboxes on an Exchange Server and merge it into the same mailboxes on another computer that is running Exchange Server. The program copies the PST file from the source mailbox server and merges the data into the same PST file on the destination server. The most common practice is to copy the data while the custodian(s) are still logged into the system. This allows the custodian to continue working while the collection is occurring. This is the main cause of the file corruption. The system cannot properly synchronize the various sets of files, in particular slight differences in dates/times, while the custodian’s email account is active.


The good news is that there is a very simple and effective solution to this problem. The solution is to make sure that the custodian is logged out of his/her account during the entire collection process and that the account has been properly synchronized with the server. It is always advisable to verify that the data was successfully collected prior to turning it over it to your eDiscovery vendor or counsel. To verify the collected PST, use the function “Advanced Find” in Outlook. If you do not see any messages in the view pane, this is an indication that the collection was not successful and the data has been corrupted.

Paraben has a tool called E-mail Examiner that does a good job of insuring that the email collection is forensically sound. Their product is more expensive than ExMerge and not as widely used. However, it is designed specifically for purposes of litigation and investigations.

Repairing Corrupted PSTs

If the collection was not done properly and the data is corrupted, repairing a PST usually involves a number of hours of senior technical time. A rough estimate is that a 10 GB PST will take a few hours to repair. There are two tools that we would recommend for this type of repair. Both tools search all the files in order to locate the corrupt files and then attempt to recover the damaged information.

1.      EasyRecovery File Repair. This tool is from Kroll Ontrack.

2.      Outlook Recovery Tool Box. This is a Microsoft tool that is usually included with Outlook.

Unfortunately not all corrupt PSTs can be repaired. If so, you will need to have the data re-collected. Be prepared for an unhappy custodian when you show up to re-collect their data.

TAGS:  PST, Collections, Corrupted, Email, Exchange, Kroll, Microsoft, Outlook, PSTs, Paraben, Server

Text Messaging and Its Impact on eDiscovery


To-date, most litigation electronic discovery requests are limited to custodian email and loose documents. The requests ignore custodian mobile phone data, in particular stored text messages. The next big eDiscovery collection trend for litigation will likely be the collection of text messages frommobile phones.

Text messaging is still viewed as something that only teenagers really use. However, the usage data on text messaging is quite revealing. Over 70% of Americans ages 25 to 49 use text messaging. The average number of texts sent per day per user in the US is over 10. In 2008, the number of text messages sent surpassed mobile phone calls. And text messaging is growing at 100 to 200% per year.

To put texting in its proper context, it is estimated that Americans send about 30 emails per day (the data on this is not very precise). This means that texting accounts for ¼ of the daily electronic correspondence sent in the US.

The first step in any forensics investigation is identifying sources of evidence.  Mobile phones store evidence in a variety of locations and media formats. Similar to desktop computers, most cell phones have an internal memory and a removable storage media (SD Cards).  Depending on the carrier, an internal SIM (Security Identity Module) card stores pertinent information, such as phone numbers, contacts, and unique subscriber registration data.

As with computer collections, mobile device collections should be done in a forensically sound manner. This means that the data collected must be collected without changing the original device content. A forensic hash should be performed on the collected data to insure that no subsequent changes are made to the data. Keep in mind that the data on mobile devices is constantly changing (e.g. clock time, network data, etc.) so it is important to make an exact replica as quickly as possible.

The main challenge with mobile collections is that most cellular phones use a proprietary operating system. This is compounded by the fact that new mobile devices are constantly being introduced into the market making it a challenge to stay current on the collections tools. Often the hardest part in the collection is just having the right phone adapter on hand to be able to do the data transfer from the phone to the acquiring computer.

After making a copy of the phone data, the next step is to analyze the data. The forensic tools available for analysis and processing are still in their early stage of development. However, there are a number of forensic tools available such as Paraben’s Device Seizure Toolkit and Guidance Software’s Neutrino.  Paraben’s Device Seizure is probably the most common tool used both by law enforcement as well as for commercial litigation.  These tools are very similar to traditional forensics software utilities and offer many of the same capabilities and functionally, such as text viewing and keyword. During the analysis phase text messages, e-mails and contacts can be identified, undeleted (if necessary), searched, and exported for review or further processing. If you are interested in more information on mobile collections, The National Institute of Standards and Technology (NIST) has a good overview.

TAGS:  Collections, Encase, NIST, National Institute of Standards and Technology, Paraben's Device Seizure, eDiscovery, electronic discovery, forensic collections, litigation, mobile forensic collections, text messages

Next  >>

Pulling Dockets for Foreclosure Defense Online


Our office receives phone calls almost daily from people who are are going through home foreclosure. As soon as we get the call all we need is their name, home address, and county the home is located in for us to pull the court docket online. The docket gives us all the information we need about the case so we are on the same page as the home owner we are trying to help. A county docket will typically contain:

Most of the time, this is information is available online on through the county website. There was a time when information like this could only be obtained through mail or a phone call. With this information being readily available we can quickly determine if the home foreclosure is something we can defend against or if it is a lost cause.

An important thing to remember is our firm fights foreclosure to help buy homeowners time to work out an agreement with the bank. Common strategies include:

What homeowners need to understand is all payments are stopped while the case is being defended in court. This allows the homeowner to save money to figure out a new situation. For example attorneys over at The Orlando Foreclosure Attorney gets a phone call from a person who has just been foreclosure papers in the mail. He has his paralegal get online and pull the docket and they determine if the case is viable. The person may be worried about not being about to afford the attorney but finds out that the law firm only charges a small monthly payment and they could stop paying their loan while the case is fought in court. The lawyer is able to meet with the client the same day and he is retained as council. This would have had quite a few more steps if the online docket was not available.

Edit: The kind folks over at The Orlando Foreclosure Attorney requested we update their link. Many thanks to Attorney Franklin for his valuable insight on foreclosure defense.

TAGS:  Electronic Discovery Strategy, Foreclosure Defense attorney, data, eDiscovery, efficiency, pull docket online, near, productions, receiving, requesting, review, strategy