e-Discovery Software

Q&A - EDD Processing Software

Q: Can Discovery Assistant Produce native load files without tiffing the documents?

A: Queue for conversion 'metadata only'. That will allow you to produce source documents without TIFFing. Most metadata except for extracted text is available (note: text from MSG files is still extracted).

Q: How long will it take to process a GB?

A: One GB takes approximately 20 hours of processing time, and produces 70,000 pages. (1 second a page conversion time, and 1 second a page for import/export/deblank/bates numbering, etc).

Q: Can you handle email from Eudora and Thunderbird?

A: We can handle native MIME email (similar to how Outlook Express is stored) and we think that is how Eudora and Thunderbird messages are stored. If some tweaking is required, we're willing to do the work.

Q: Customer needs Multi-page TIF, metadata and bibliographical data in text delimited format with field headers [associate with doc id number as well]. All parent IDs and attachment IDs must be captured.

A: If you export, and choose Character Separated Text, then under Options, select all - That gives you the full set of metadata fields that we extract and export, including Parent ID and Attachment ID. We can extract limited bibliographical data from Office, Excel, PDF, and email.

Q: How would I set output to Eastern Standard Time, non-military format?

A: We read timestamps as UTC, then convert to local time. Set the machine you are working on to Eastern Standard Time, non-military format, that is how we will produce the date/times.

Q: How do you handle corrupt or password protected files? Placeholder file? Exception report but do not process? Placeholder File; set aside and notify us.

A: Current solution is to fail on first attempted conversion/import. Failed items can be 'copied', unencrypted/unlocked, then re-imported. OR you can do pass-through (placeholder) for these items.

Q: How do I export in chronological order, earliest to latest date?

A: When exporting, users normally ask for parent/child sort order. You can also sort and export based on date/time. If you are planning on sorting on date, then best to first set the date format on the machine as YYYY-MM-DD. All formatted dates will then be sortable.

Q: We also need a blowback set for our customer's office.

A: Do you mean hard copies? You can choose to print converted TIFF files (either stamped or unstamped) from Discovery Assistant.

Q: How do we export to CD?

A: You can export VOL\BOX, limit the output size to CD, then burn a CD from that exported data.

Q: When viewing image and text side by side, when I scroll the tiff, shouldn't the text scroll as well?

A: It's an interesting idea, and will add it to our things to look at. Basic issue is we don't know where the TXT is relative to the Image. All we know is this is what printed to get that TIFF file. We could still probably scroll the text as you scroll the image though, and hope that we get it right.

Q: Can you print the "all Files" Screen so that you can have a hard copy list of file ID, Bates, Name, Status. Hash etc etc- or can you export that to csv file?

A: You can 'copy' to clipboard, and paste into an EXCEL spreadsheet. Other alternative is to use our DAReportManager reporting tool that can load the XML project file, and convert to XLS.

Q: How do you actually select a file from list? I highlight it, and then click a command such as convert, and it says no files are selected do you wish to select all files.

A: The convert button has an 'arrow' beside it that shows a pop-up: Choices are: All, Selected.

Q: What is the mtf file used for, Is it simply a txt document that the attorney can have to see all the metadata fields in an easy to read format?

A: The mtf files stores the metadata in before we do an export. Our thinking was when we put in the search feature, we can search the MTF file without first having to do an export.

Q: Is there a way to see a dedup report of files that were duplicates? I think there is, but I didn't see it.

A: If you sort on the 'duplicated' key, then that gives you a list of all files for which there is a duplicate.

If you run the project file through Global Deduplication, then that will identify the global primary file, and Global count.

On any particular duplicate, you can list it's parent, it's siblings, or it's duplicates by pressing one of the action buttons in the button bar.

Q: Can you tell me what filtering technology you use (Native/Mapi, Verity, Stellent, or Open Source) for the following:

A: We use the following methods to extract files and metadata:

  • PST Extraction
    • we've written our own extractor, that in turn uses the Office OLE object.
  • Lotus Notes Extraction
    • own tools that in turn uses the Lotus Notes.dll files.
  • Office file metadata extraction
    • use Office OLE where appropriate.
  • Other file types metadata extraction
    • PdfDump (our own tool), all other file types we use standard windows API.
  • Imaging
    • default for TIFF, DCX, PCX, PNG, JPEG is our viewer. We could extract additional info, but currently using Windows API (create / modify / accessed date-time, size).

File type is determined using binary data to get an extension type. Extension type is then translated through the operating system to get a file type.

At the moment, we do not include an integrated search. However, we do support extraction of source, tiff, text, and metadata as separate files. Search tools can comb through these files looking for matches.

For conversions we rely on the following applications being installed:

Microsoft Office, Lotus Notes, Windows Internet Explorer, Adobe Acrobat Reader, and any other viewers necessary to open and print native files.

Conversion are done using the PrintTo interface. Custom converters exist for Word, Excel, PowerPoint, and internal Office / Notes file formats.

Q: Do you know what the limits are on the number of files you can process at a time? I have 3,295 files. I can't get it to load more than a couple hundred at a time. Thanks.

A: You should be able to load up to 50,000+ files in a single project. Suspect the problem is that you are using the 'Add Files' dialog. To load in a directory of files, use the 'Add Folder'. Optionally, you can open explorer, and 'drag' the files across.

Finally, you can construct a list of files, and 'Add From List'.

Q: When I go to convert, there are no output file format types (list is blank) in the Convert dialog.

A: It looks very much like an incorrect installation. The quick fix is to email us the file:
"c:\program files\imagemaker\Discovery assistant\install.log" and we'll try to figure out what went wrong.

Q: My lawyer wants all messages and the attachments to be exported in one TIFF file.

A: We can't quite do that, but here is something that comes close:

To itemize what files belong to what message files:

  • assign Bates Numbers and Document ID's
  • messages with attachments are visually identified with a Bates Group Range value.
  • optionally bates stamp the documents.
  • export stamped or unstamped files.
  • output files are named by DocumentID
  • output a CSV file that includes limited details, including Document ID, and ATTACHMENTRANGE. (may also want to include subject, or title).

Q: I'm having a problem processing MSG files. The Outlook dialog comes up, gets dismissed, but the document does not process.

A: Possible fixes:

Am wondering if I can get you to temporarily turn our 'clickyes' program OFF. Can do this by renaming the registry item: HKLM\Software\ImageMAKER\DA_ClickButton. (add an extra character to the name).

Should now be able to manually close the dialog (and allow for 10 minutes of activity). If this change works, then we'll look further into what the problem might be.

There is an alternative 'close' tool called 'ClickYesSetup' that you can install from the Start / Programs / ImageMAKER Discovery Assistant group. If that is installed, and set to active, and it solves the Outlook Dialog problem, that would be another solution.

... also, if you have any other third party software that is set up to close these outlook dialogs, that too could explain the problem - two applications are trying to close the same dialog.

Q: How do I exclude Outlook attachments from being processed?

A: Under the Options / Scan tab, set 'exclude outlook attachments'. When you add in a MSG file, we now ignore all attachments.

Q: How do I control the formatting of the MSG file to look like it was printed from Outlook?

A: Default behavior is to extract HTML / RTF / TEXT, then use the native application to print that file.

To get 'outlook' formatted output, from Options / Outlook tab, change rendering from 'default' to 'Outlook MSG'.

Q: Can we generate a report or export to a .csv / spread sheet of all files that couldn't be converted, non-convertible or failed, that contains the file name, path, etc.?

A: If you download DAReportManager, that will give you the ability to create a spreadsheet containing all the information from each of the tabs. (contact ImageMAKER for download instructions).

Q: How do I handle foreign character sets (like cyrillic)?

A: As far as we know, Summation and Condance do not support Unicode. For this reason, metadata and extracted text are exported as MBCS (multi-byte character strings), which can be handled by Concordance/Summation.

The TIFF files print using the native application. If the native application supports foreign chars sets, then those characters are properly represented in the TIFF file.

To set up for MBCS output:

  1. Go into control panel and select 'Regional and Language Options'

  2. Go to 'Advanced' tab.

  3. In 'Language for Non-Unicode Programs' drop down list box choose 'Russian' for cyrillic. (Other language choice for alternate character set).

  4. In DA set Outlook rendering format to 'default'.

  5. Start the conversion process.

Adding Arabic language support

the user did not add Arabic language support during installation wizard, after installation completes the user can add the Arabic language support by the following steps:

  1. Double click the Regional and Language Options from the Control Panel folder.

  2. From the Languages tab select the check box “Install files for complex script and right-to-left language”.

  3. From the Regional Options tab select the required Arabic language local and location required.

  4. Press OK button.

You should be able to review the extracted TXT, and metadata using Notepad. Text data is MBCS encoded.

The following MBCS character sets (code pages) are supported:

  • 874 Thai
  • 932 Japan
  • 936 Chinese (PRC, Singapore)
  • 949 Korean
  • 950 Chinese (Taiwan; Hong Kong SAR, PRC)
  • 1200 Unicode (BMP of ISO 10646)
  • 1250 Windows 3.1 Eastern European
  • 1251 Windows 3.1 Cyrillic
  • 1252 Windows 3.1 Latin 1 (U.S., Western Europe)
  • 1253 Windows 3.1 Greek
  • 1254 Windows 3.1 Turkish
  • 1255 Hebrew
  • 1256 Arabic
  • 1257 Baltic

Q: I'm wondering if you can give me some additional insight into why there are so many blank pages in this spreadsheet. If you look at it in print preview in Excel, go to page 19 and you will find 19-29 blank. Also 49-60, 79-94, and I quit looking at that point. As far as I can tell, there is no data. Is there something going on with formatting, maybe?

A: In spreadsheets, users usually set the 'print range' of active cells to print - and that range usually contains data. However, the print range can miss huge swaths of the spreadsheet - and it's the swaths that we want in the discovery process. Discovery Assistant defaults to printing the entire sheet, blank pages and all, which is defined to be the largest box that contains the top left cell and bottom right cell.

Discovery Assistant spreadsheet formatting settings are controlled through settings in the Admin / Configure / Excel Options page. The following settings are supported:

  • Set all worksheets to active before converting
  • Clear print area before converting (print all cells)
  • Clear headers
  • Clear footers
  • Orientation
  • Scale
  • Comments
  • Order
  • Print Quality
  • Paper Size

Q: Quick question on the hashes that DiscoveryAssistant produces. A client of mine is insisting that the MD5 hash does not have any dashes in it, though the hash values DiscoveryAssistant produces does. If the dashes are removed via search and replace, would it still represent the correct hash value?

A: Yes, it's still valid. We put the dashes in so the number is human readable.

One other point - you can customize how much of the file to hash. We then binary compare if we get any matches. If you are doing global deduplication using the hash value, make sure the Options / De-duping /HashCode sample size is set to 0.

Q: The word document I am converting contains 2 pages, but when I process I only see one page.

A: If you see only one page in the TIFF file, and that page contains everything that the original doc contains, then I have a very good explanation...

When switching print drivers, word documents reformat slightly. You can see this when changing the default print driver in Word, with a Word document open. The text position will change slightly.

The same file printed to an HP printer may produce two pages, and when printing to a Postscript printer come out as only one page.

To get exact equivalence between the two printers, we need:

  • to be using the same font set. Postscript fonts are different from True type fonts that we use.
  • print margins on the printers must be the same.
  • resolution must be the same.

Q: What is 'child next' order when assigning Bates Numbers. Also, do you ever assign the same bates number to more than one document?

A: We load the files and attachments in a slightly different order from how case management systems request them.

We load in the first layer of attachments, then go back looking for attachments to the attachments. Understand that the 'proper' way of importing/exporting is to list the first attachment, and it's children before listing the second attachment. For lack of a better term, we use the name 'child next' order.

To get the right export order, we recommend assigning Bates Numbers in child next order, then sort afterwards on bates number. Then export. You'll note that we also fill in the bates range for each message at this point. (allowing you to confirm that everything is listed in the right order).

As for handling duplicates, you have 3 choices at time of conversion:

  • ignore (or copy) duplicates.
  • skip duplicates
  • link duplicates.

If you ignore (or copy )duplicates, then each file gets it's own bates number. If you skip duplicates, then only the first 'converted' file gets assigned a Bates number. If you link duplicates, then multiple 'duplicates' will all be assigned the same bates number.

Q: What process is used to identify whether a file is readable. Is a listing such as the NSRL used? Does it utilize the extension of the file or does it extract the file to determine what it is (i.e., if an Excel spreadsheet has a .AAA extension instead of .XLS extension, will it still recognize it as an Excel spreadsheet and treat it accordingly?

A: Process by which we recognize the file type is:

  1. Use the contents of the fAssoctable.txt to do a first pass through. If a document has extension DOC, but is really a zip file, then we mark the file as a zip file.

    If the file type isn't recognized, then we keep current extension.

  2. Check if there is a file association set on the computer for the file type. If set, then we assume the file type. If no file association found, then we route the file to the 'unconvertible' tab.

  3. Attempt to do a conversion on queued files. If any conversions fail, then queue those files in the failed folder.

  4. Human operator can review the unconvertible files, if the file type isn't supported, users can look up file type at: "http://filext.com or http://www.nsrl.nist.gov/" Then acquire the owner application, install that application, and then do a 'recheck' to re-check unrecognized files.

    Operator also has the option of manually assigning file type. Otherwise, files can be sent through using the 'pass through' feature.

  5. Human operator can review failed files, and manually 'pass through' those you want to keep. [Just noticed there is no way to manually assign type to failed files].

Q: There is a process that a lot of law firms are doing and I was wondering if you had a way to accomplish this. In order to save money the law firms will request that we only deliver the metadata and OCR, no images. Then, they tag the responsive docs they want and ask us to only TIFF those. Any ideas?

A: When converting, you can choose:

  • MetaData only (requires no conversion, and is very fast).
  • MetaData and text (requires conversion, but we don't spend the time to produce a tiff file).
  • MetaData, Text and TIFF (no need to do this just yet).

If you are dealing with just email, the email text is extracted as part of the MetaData. That might be good enough for you needs (and is fast). Only time you would want MetaData and text is if you want the text contents of attachments, and the text contents of loose documents.

Once you've done the conversion, you can export that data to Concordance/Summation/IPRO/CSV for analysis. Trick is to remember to extract the FileID as one of the export fields.

User imports the data into Concordance/Summation/Case Management System, analyses it, then exports back out a list of documents that they want Tiffed. That list of items needs to include FileID as one of the record items. Using a text editor, or Excel, you should be able to remove any extra data, creating a TXT file with just one FileID per line.

Next, re-open the Discovery Assistant project, create a copy of the project (save as), then remove all converted/queued items. Then, from the Menu, select: Project / Queue from FileID list. Open the FileID list you just created, and only those items with FileID's specified will be queued for conversion.

Convert, bates stamp, export, and you're done.

One advantage of this method... is the client needs to use your services twice, once to get the metadata with FileID, and the second time to get the TIFF files. It would be difficult for the client to identify what documents to process without getting you to first do the extraction. If the client were to give you loose files, they run the risk of loosing the file relationship information - and potentially changing the data as items are extracted from the original source documents.

Q: What caliber of person is required to competently operate the DA software?

A: The product needs to be set up and run by an IT knowledgeable person (2+ years experience).

Most problems that we see are 'setup' problems - getting the product up and running in a production environment. Startup issues we see are:

  • not enough memory or hard drive space.
  • needs additional programs installed.
  • problems with existing installed programs.
  • learning the product (there is a lot to review).
  • understanding the conversion process.
  • tricks and tips.

Once the product is up and running, relatively junior people with little or no IT experience can run it. If a problem comes up, then the senior IT person needs to look at the problem, then call us if it can't be solved locally.

Q: What training programs do you offer?

A: A person with IT experience can get the product up and running without any additional training on our part.

That IT person in turn needs to train the user, and be available to answer any user questions. User training by anyone other than the IT person is difficult, as it is the IT person who is going to have to answer the first line questions, and who needs to do the re-installs and first-line trouble shooting.

Q: What database architecture are you guys using, SQL? Access? And do you have any sort of distributed processing for large jobs that I may want to spread over multiple machines?

A: We use an XML file loaded into memory to get the fastest possible database processing speed. Maximum practical size of XML file is 500,000 files.

To handle data sets larger than half a million files, we provide you with an add-on tool we call Terabite that builds an MDB file containing millions of files, then export this list to multiple 'Discovery Assistant load files' by breaking the list down by bytes (2 Gigs), or number of files (100,000) - both numbers are user configurable.

To set up multiple jobs, break the data into smaller projects, (possibly using our Terabite program to automate the process), then load and process multiple projects on multiple machines. As long as the source files are referenced with a \\Server\share UNC name, it doesn't matter what machine is used to do the processing.

In addition to the Terabite tool, we also provide a tool to map XML project files back to XLS or MDB to facilitate reporting.

Q: What are the limitations as to the maximum number of files to be processed in a batch?

A: Our current recommendation is to keep the number of files per project batch to less than 500,000 files. When processing the batch, to optimize speed we load the file list into memory. As the list size grows, the time to load/update/save the list gets increasingly longer. 500,000 files seems to be a practical limitation. There are no physical limits on files sizes, or number of pages per file.

Recognizing that different data sets have different file densities, the rule of thumb we use is each gigabyte of data translates to 70,000 converted pages. At an approximate conversion speed of 1 page per second, a single copy of Discovery Assistant should be able to process 3,600 pages an hour, or a gigabyte of data every 20 hours.

Our recommendation is to keep the lists sizes to 100,000 files, or a 2 gigabyte maximum. That keeps job processing at less than 24 hours per job.

Q: How do you process word files that have mark-ups with in the doc file?

A: The default conversion process converts the document similar to how it was saved. If markups are displayed when the document was last saved, then the markups are printed. There is a difference in how the markups print based on whether you are using Office 2000, or Office 2003. Office 2003 prints much more information about markup changes than Office 2000.

Q: How do you handle password protected files?

A: If the file is password protected, our current default behavior is to time out waiting for the application to print. We then kill the application. The default timeout value is 30 seconds. If there are a lot of password protected files, then conversion is going to go very slowly.

Failed files can be 'moved' to another directory, and then set up for password cracking. Our understanding is that cracking a password can take multiple hours per file, and not something to try in real time.

At some point in the future, we'll look at trying to determine if a file is password protected before attempting the conversion.

Note: there are a number of 3rd party applications designed to handle password detection and cracking for: Excel, Access, Word, RAR, PDF, Outlook.

Detection:
http://www.ozgrid.com/Services/find-protected-files.htm

Cracking:
http://www.ozgrid.com/Services/access-password-recover.htm

Q: What are the benefits of paying Maintenance and Support?

A: Having paid up maintenance ensures that you have continued access to developer support, and that we can help you with any problems that come up. We understand that your business is to convert documents to TIFF for your client, and that if you have problems, you need them fixed as quickly as possible. If you have specialized requirements, our developers are also available to do custom development.

Q: How do I add support for GIF files?

A: On Windows XP and Windows 2003, the Windows Picture and Fax Viewer can do the job. To set the default, go into explorer, do a search for GIF, then open a GIF. At that point, the file association will be set. Can then do a re-check from Discovery Assistant, and the GIF files will be convertible. Same process for JPEG.

On a Windows 2000 machine, run the Imaging For Windows application, and set the menu item: Tools / General Options - open images in Imaging.

Q: How do I change from using UltraEdit back to making Notepad the default TXT viewer?

A: Ultra edit file associations can be difficult to over-rule, especially if you are looking to use Ultra Edit for other purposes.

Fix is to go into Discovery Assistant / Admin / Configure / Documents
Look for .TXT, and select 'Modify'.

Next, put the following command into the 'override cmd' edit box at the bottom of the dialog:
%SystemRoot%\system32\notepad.exe /pt "%1" "%2" "%3" "%4"

Note: Make sure that Notepad File / PageSetup has the header/footers removed....

Q: What is the best setup? workstations writing to a central SQL DB? Workstation with discovery assistant and SQL local?

A: Ideal setup is as follows:

  1. project is broken down into manageable 1 gig bites (less than 200,000 files)

  2. files are loaded into a discovery project file. Ideal is data resides on a server, and references to that server are UNC based (server\share).

  3. project files are converted on one or more machines.

  4. problem files are dealt with.

  5. Bates numbers are assigned starting with project 1.

  6. converted files are exported out to a standard load file format.

  7. can use a separate application to convert XML project files to XLS spreadsheet files for reporting and record keeping as to what got converted, what failed, what the source directories are, etc. (I need to send you this application - Will be incorporating it into the Discovery Assistant drop next).

  8. load files are loaded back into a SQL database.

Discovery Assistant native database is XML (flat file). We load the whole file into memory, and all database operations are fast. At one point we modeled using an SQL database from within Discovery Assistant. However, the speed of access was slow. Made it difficult to 'assign' large groups of files to different status values, etc.

Q: What are the best practices for planning a multi-batched project?

A: We've been developing the tools to manage terabytes of data. Believe best practices work as follows:

  1. enumerate in a TXT file (or simple flat database file) a list of all files to be processed.

  2. Break this file into smaller 'load files' based on a maximum number of items (50,000), or maximum total cumulative file size (1 gig).

  3. Load these TXT files into DiscoveryAssistant to build fully functional project files.

  4. Do the conversion of the project files

  5. Export the project files as Load files

  6. Load the converted files back into the Discovery application of choice.

We've found that by de-coupling the process from an SQL database, and using XML load files managed entirely in memory, that our ability to access data, compare files for duplicates, and manage project queues of 50,000+ individual files - is significantly sped up over traditional database access times.

Q: What is the procedure for merging batches and renumbering?

A: Quick answer is that you need to first convert before assigning bates numbers.

Am suggesting that if batches are numbered, and delivered in that order, the process of bates numbering can occur concurrent with the delivery of petrified data. Multiple batches can be converted at the same time. Batch 2 can't be bates numbered until Batch 1 conversion is complete.

In the event that files do have to be re-done due to a customer request, there are a number of options that allow you to set bates numbering to what ever start number is decided upon.

Q: What QC functions can be run on processed data while another batch is being processed (on the same machine) if any? or does the current batch need all local resource to be as fast as possible?

A: XML files are used to define the project. If you want to QC a project as it is being converted, then you have to do it in the active conversion session. If you want to QC a project that has already been converted, then this can be done on a second machine. User can open the project file, then view the converted data.

You can install as many 'QC' versions of Discovery Assistant as you like, for no extra charge. The QC versions do not convert, but can 're-queue' files to be converted.

Q: Keyword Searching?

A: Not something we currently offer. Best to export the document MetaData to Concordance or Summation, and then work from there.

As an alternative, it is possible to complete the conversion, then use Google Desktop to index the resulting files.

Q: What are the settings for TimeOuts in Discovery Assistant?

A: Default values are (in seconds):

HKLM --> Software\ImageMaker\xdc\Settings
TimeoutFirstPage
INT --> 120
TimeoutNextPage
INT --> 120
TimeoutStart
INT --> 60
TimeoutTotal
INT --> 600
TimeoutMaxPages
INT --> 0
TimeoutPrintQueue
INT --> 60

If you want to make the timeout infinite (for really large multi-page files):

Discovery Assistant / Admin / Configure / Timeouts
Set Timeout Total to 0 (infinite).

If you are running really large files, we find that the spooler sometimes runs out of room. If so, need to set the print driver Advanced Settings (Start / Settings / Printers / Properties / Advanced tab) change from "spool print documents" to "Print directly to the printer"

Q: When Discovery Assistant screens date macros from putting in today's dates in Word and Excel documents, what date if any does in fact appear in the petrified image? Is the date macro removed entirely and no date appears? does the date of last modified appear?

A: We switch the machine date to the date of the image being rendered. (date it was last saved). The macro still runs, and the date gets filled in. It's difficult to disable ALL macros. It's also misleading to not put in a date.

When we output Word or Excel documents we set UpdateFields=false (using WordPrintTo.exe, ExcelPrintTo.exe) so date fields don't get changed from their last value. This operates independently of the feature that temporarily changes the system date (enacted through the admin interface). Usually changing the system date is only required if the conversion process produces headers/footers with the date/time and these are required to reflect the last saved date/time. Headers/Footers can usually be turned off for most documents by opening the parent application (Word,Excel, etc.) with a blank document, selecting PageSetup from the file menu, and removing the header/footer.

NOTE: for this to feature to work, you must turn ON the Date Handler in Discovery Assistant. To do this, go to the Admin dialog, select Configure, and under the Options tab, select: 'Reset System Time to LastWrite Time before conversion.'

Warning: with this option turned on, do not use this machine during conversion for other business functions.

Q: My PST file contains 27,000 emails, each of which contain a signature file. In total there about 10 different signature files. When I go to export to Summation or Concordance, for each signature file reference, the list of duplicates is enormous. The DII file itself is larger than 80 Gigabytes. What can I do?

A: Go back to the AllFiles tab. You should be able to locate one or more of these common signature files. Identify the FileID and Hash value, then sort on hash value. Once you've re-located the signature file in question, delete all the copies of this file from the data set (OK to leave one). You may want to make a note of the number of copies you've deleted. Do this for any of the other signature files that are a problem. Then, go back and try re-exporting. The Data file should now be much smaller.

Q: Recently, my system has been crashing more often, due to a process or program "CiceroUIWndFrame".

A: I found out that this is the "Speech and Handwriting Recognition" part of Office XP. To de-install it, go to :

  1. "Control Panel"

  2. "Add/Remove Programs"

  3. "Microsoft Office," click on the "Change" button

  4. browse to "Office Shared Features," "Alternative User Input," and select for Speech and Handwriting Recognition (both) "Not available" from the drop-down box.

  5. Perform the change and the CiceroUIWndFrame messages disappear. Took just a second, and you don't even have to restart your system or insert the XP Office CD

  6. Under Win20003, Control Panel applet is Text Services. You want to remove the Handwriting recognition services.

Q: If I have a job already stamped, but don't want to print the whole thing, is there a way I can select which pages to print? What happened was the computer and/or printer caused the print job to stop midway through (I assume the software won't cause this).

A: To print just one file from the list:

If you 'double click' on the image in the stamped tab, that will bring up a viewer (ours).
You can then 'print' that image by specifying which pages you want printed.

To print all pages of multiple files in the list:

Select which files you want to print from the list. (Shift Click, or Ctrl Click), then select the Print button. It should only print those files which you've selected.

To print selected pages of multiple files in the list:

Double click on each file, and use the viewer to print.

Q: Any information you have on performance, e.g. number of files converted per minute?

A: In regard to speed, there are no hard numbers. We tested 7 WORD files (simple graphics, lots of text) with the following page counts: 3, 71, 3, 5, 16, 3, 204

3.2 GHZ machine, no hyper-threading. 1 Gig of memory and big hard drive.

Output DPIOutput FormatPages per MinutePages Per Minute without last file:
300G48470
300G38770
300G3257206<-- dithering set to Windows Fast Dither.
200G4150130
200G3150130
200G3332270<-- dithering set to Windows Fast Dither.

At 300 dpi, pages per minute ranges from 70 - 84. For smaller file sizes (one page per file), they have been seen to go as low as 30 pages per minute.

At 200 dpi, pages per minute ranges from 130 - 150 per minute. For smaller file sizes (one page per file), conversion speed can go as low as 30 pages per minute.

For files containing a large number of pages, conversion speeds are somewhere above 200 pages per minute.

Conversion speed can be increased dramatically (doubled) if you switch to Windows Fast Dithering as the dithering option (Printing Properties / General Tab / Printing Preferences / Advanced / Image Rendering Options - color mode).

Basic trend:

  • Speed is greatly enhanced by setting the default dither output to 'Windows Fast Dither' (image quality can be slightly compromised).
  • More graphically complicated files take longer to convert.
  • The higher the output resolution, the slower the conversion.
  • Additional processing for MSG and PST results in slower conversion.
  • There is a slight performance penalty for saving in G4 format.

The Windows Fast Dither uses a reduced memory area for conversion, and 'dithers' the text and graphics to B&W as they are being written to the surface. The Error Diffusion Dithering method dithers the whole image when it is being written to file (and can take up significantly more memory).

Default for the ODC Carrier is to set 'Windows Fast Dither' to on.

Q: One problem I have run into is that DA has difficulty with and often get an error message when I try to convert a .pdf file that is larger than 500 pages. What can I do?

A: The PDF problem I believe is related to us running out of spool file room. The PDF application when printing is putting data into the spooler much quicker than we can pull the data out.

Quick fix is to go into the printer properties / Advanced tab for the ImageMAKER XDC Service1, and set the spooler properties to 'print directly to printer'. This means that as soon as data goes into the spooler queue we take it back out. Acrobat is going to remain open much longer.

Only problems we are aware of with always having this setting on is that some Word documents with landscape/portrait pages don't print correctly. Also, some documents will print slower.

Q: How does your software handle error reporting, if at all?

A: When you import files to be converted, the product first identifies those files that can be converted from those that can't.

Then, during the conversion process if there are any failures, the failed files are listed in the 'failure' tab. Converted files are listed in the 'converted' tab. Files can be 're-converted' by moving them back into the 'queued' tab.

We've tested file lists of up to 250,000 files with no problems. If you need to convert larger numbers of files, then we suggest breaking them down into sets of 100,000.

Q: Does the software deduplicate emails? If so, based on what criteria?

A: We've built in automatic duplication removal. We create check-sums for each item (email, attachment, or file) that we then compare to all other items in the list. Before confirming a match, we do a complete byte compare between the two files. For emails, we check the 'text' contents against each other, rather than the whole msg file. (every MSG file is unique). The user then has the option of converting the duplicates (or not), and exporting the duplicates (or not).

Q: How does the software handle Excel spreadsheets? Does it remove blank pages automatically?

A: We've developed some specialized Excel spreadsheet software to do the following:

  • print all sheets, not just the currently activated one
  • print the defined print range. If no range defined, then print the entire sheet
  • >
  • User can limit the number of pages wide, and pages high so that large sheets fit into a pre-defined number of pages (Admin/Configure/Excel Options)
  • have just completed development of a 'blank page removal' tool. We can provide you with a stand-alone demo of this, and will be incorporating the feature into Discovery Assistant after we've done some more testing.
  • have prototype tools in-house to dump out the formula contents of a spreadsheet on a cell-by-cell basis. Understanding is that this exposes all 'text' in the spreadsheet for lexical analysis.

Q: What do I do to ensure there is white space for the bates stamp? I want to ensure that the Bates Stamp does not obscure the underlying data.

A: The solution is to do the conversion into a smaller area than what the original image size is.

The print driver has a setting that can modify the page margins. Default is 0 margins (output TIFF image size is same as input TIFF image size).

Go to Settings / Printers. Select properties for ImageMaker XDC Service1. Under the Device Settings tab, look for unprintable regions. Set the unprintable regions so that you have room for the bates stamp.

Suggest the following settings:

  • top: 0
  • right: .25
  • left: .25
  • bottom: .5

Q: What happens if one of the servers or clients crashes in the middle of a conversion? How does DA recover? How does the user manage events like this?

A: All conversions are controlled by the client, but can be handled on either a client or server.

In the event that the server machine dies, the client will time-out, then go on to try the next item in the list. In the event that their client machine dies, the user can re-start that machine, and it will pick up where it left off. All conversions status information is maintained in an XML file. This file is updated after every conversion.