Home Product
Description
FAQ Pricing Downloads Tech Notes Litigation Support Contact Us
e-Discovery Software

Q&A - EDD Processing Software

Q: I'm probably overlooking something simple here but if I have a batch of converted documents and would like to export them as a single searchable pdf, how might I go about this?

A: If the exported documents are scanned PDF or TIFF, then to make them searchable you still have to OCR them. To do this, import the documents into an OCR application (ABBYY), and convert them to searchable PDF. You can use Acrobat to import multiple files into a single PDF.

If you want Discovery Assistant to export searchable PDF files directly, this can be done as follows:

  1. install a Postscript printer (Apple Laserwriter or Hewlett Packard Postscript printer).
  2. install the Discovery Assistant Postscript add-on.
  3. Convert source documents to Postscript.
  4. Export the resultant documents as searchable PDF.

The output files can then be combined back into a single PDF by using Acrobat Professional to combine them.

If your original documents are TIFF or scanned PDF, then converting to postscript and then converting that to searchable PDF will still not give you what you want - searchable text will be missing from the PDF.

What our customers normally do is convert everything to TIFF, then smart OCR to get text from those files that do not contain text at time of printing (original documents are TIFF, scanned PDF, or Jpeg). They then export TIFF and TEXT (two different file types).

Note: Files with editable text at time of printing have their text extracted when they are printed.

Q: Why do msg files have N.A. under status?

A: The reason for the NA is that we are having trouble extracting the message file. If you select the item in question, then do an 'open source', that may give the reason for the NA failure. Alternatively, run the logger, then for that one item do a 're-check'.

Q: Why does the project identifier in General tab stay the same default in every case and is there a way to change that default?

A: The 3 letter project name is based on the long name - heavily weighted to the first couple of chars. If the saved long name varies significantly in the first couple of chars, then the project identifier will likely be unique. To change the identifier, go to the Options Dialog / General Tab, and change the project identifier.

Q: Can I combine different projects later if I add in two drives from the same person, but later want to produce by custodian?

A: Here's our recommended way of combining multiple projects:

  1. Download and install the Global Master application from the www.DiscoveryAssistant.com web site.
  2. Add the projects that are to be combined into the 'global deduplication' project. Project settings will also be synchronized at this point.

When exporting, you can export from each of the separate projects to a specified directory, (or the same same directory as long as all files are uniquely numbered).

You can then combine the two load files (text editor), or provide the end user with two load files instead of the one (fairly easy to load in two load files).

What we've seen in the past is that if projects get too big, the load file size gets to big to load, and the load file has to be broken down into pieces anyway.

Q: I recently updated my Adobe Reader from 8.0 to 9.0, by uninstalling the older version completely then installing the newer version. For some reason the printto command was removed from the computer when I did this. So I manually copied a printto command in and updated the link to refer to the newer version, but for some reason it still wont print. So I reinstalled the Discovery Assistant program and restarted the machine, and it still won't tif. Is there any issue with the newer version of Adobe reader that could be causing this issue?

A: We're using DDE, and AdobePrintTo is using the operating system to launch the application from this registry location: HKEY_CLASSES_ROOT\AcroExch.Document.7\shell\Open\command. All you need to do I believe is change the location to point to the Acrobat 9.0 exe.

Q: We are processing a bunch of really garbage email for a client. A number of the email messages both Outlook Express and Outlook have embedded http: links to images (advertising type stuff). A lot of the image links timeout causing "Error -11". Probably because the linked image is no longer available.

Is there a way to process these emails without trying to follow the dead http links?

A: There's a quick fix: Open Internet Explorer, Choose 'file / work offline' from Explorer, then quit.

If you try re-converting, it will now work (no timeout).

Note that this will affect the appearance of the output. Many web pages retrieve not only content but style sheets (formatting) from the Web at display time; the way the page displays will be different without the style sheet, but there will not be an error or warning. Many pages also load script libraries from the web at display time; these will throw script errors ("object required" or "undefined"). Discovery Assistant will intercept script errors and continue, but the absence of the script and its effects will change the way the page displays.

Q: How many gigabytes per day can you process? Discuss your processing (gigabyte) capacity including, but not limited to, Metadata extraction, conversion, production to tiff and searchable PDF.

A: Ideal is 1 to 2 gigabytets per project. Multiple projects can be processed using GlobalMaster.

Q: Which versions of MS Outlook can you process?

A: We support Outlook 2000, 2002 (XP), 2003 and 2007. Also, Outlook Express (eml) support is included.

Q: Are you able to process Lotus Notes files? Which versions of Lotus Notes can you process?

A: We support Notes 6, 7, and 8 Lotus Notes clients. (NSF files).

Q: Do you have to convert first to Outlook PST in order to process Lotus Notes?

A: We process NOTES output directly.

Q: What problems have you experienced in processing data, particularly, Lotus Notes data?

A: Occasionally we have file import errors. There are various ways to get around these problems. Best if we discuss as they come up.

Q: How have you resolved the problems you have experienced in processing data?

A: We have resolved all reported problems. I'm sure there are still many problems.

Q: Please describe the processing steps from your receipt of the data to the deliverable.

Discovery Assistant supports the following data flow process: Crack / enumerate / de-duplicate / convert to TIFF, Metadata and Text / deblank / Smart OCR / BatesStamp / Assign DocID / Export to Concordance, Summation, Ringtail and CSV.

What extra steps do you take to secure the data in-house and during transfer (electronic or otherwise)?

A: We're assuming that the data processing facility is secure. No need for the computer to be connected to the internet. The assumption is that processing is done on an island surrounded by high concrete walls. Anything that happens within that processing environment is separated from the rest of the world.

Q: What procedures do you have for tracking chain of custody?

A: We create a report for every processing activity.

Q: What is your data storage and retention policy?

A: We license software to you, so there is no need for us to see any of your customer data.

Q: Do you have the latest version of PGP Encryption software installed at your processing facility? Can you decrypt PGP Encrypted files and Hard Drives?

A: If source files are encrypted, we request that the encryption be removed before processing. We identify encrypted files by 'failing' to process these files. The user must then determine the cause of failure, and resolve the problem.

Q: Can you de-crypt WinRar files?

A: Requires a third party tool.

Q: What Pre-Processing reports do you provide?

A: Summary report as to how many files, total data size, and an estimate of the number of pages.

Q: What file/data exceptions do you identify at the Pre-Processing stage?

A: We categorize files as convertible and 'non convertible' at time of import. For non-convertible files, the operator has the option of installing the appropriate software to do conversions, to assign the file to a different 'supported' file type, or to pass-through the file to create a place-holder.

Q: Can you generate a list of password protected files in a Pre-Processing stage?

A: Best to check failed files, and then work backwards from there. If a file is password protected, suggestion is to make a copy of the file, uniquely named to match the file-id of the failure file, un-protect the file, then process the decrypted items in a separate project.

Q: Please describe your procedure to identify duplicates?

A: We calculate a MD5 Hash value for each file processed. If we find a match, we do a binary comparison to confirm the MD5 Hash is correct. If de-duplicating across multiple projects, then we rely on the MD5-Hash value alone. Duplicates can be 'skipped', or 'skipped if parent is skipped'.

Q: Please describe the hashing algorithms used in de-duplication of emails and emedia files.

A: With Emails, you have the option of specifying what fields to remove from the hash calculation. For instance, you may not want to hash on sent/received dates, but keep everything else. We then hash the 'text' equivalent of the email message, removing any graphic or attachment from the calculation.

Q: Can you perform near de-duplication on email threads and items? ,/b>

A: At the moment we do not support near de-duplication. However, we do support the Microsoft Office internally generated "Conversation Topic" and "Conversation Index" values - that are used to associate emails within an email chain.

Q: Can you de-duplicate based on a "family" hash?

A: At the moment, no

Q: How many file types can you process?

A: At time of install, we list all supported file types on the installation machine by querying the machine as to what software products are installed. At time of import, we identify files that we don't know how to duplicate. Users can then install the owner application in order to support that identified file type.

Q: Can you perform Full Text search? Is this functionality embedded in you EDD system, or is it a separate process?

A: What we recommend is process 'metadata only', then export Source and Metadata. You can then use the third party product DTSearch to do the searching. The DTSearch process produces a list of files to be produced. (Can select this list in Discovery Assistant).

Q: Can you perform Sender/Recipient search? Is this functionality embedded in you EDD system, or is it a separate process?

A: This information is extracted in the metadata, and can be exported to a load file.

Q: Can you perform Date Filtering? Is this functionality embedded in you EDD system, or is it a Pre-Process?

A: This information is extracted in the metadata, and can be exported to a load file. We do display one date at time of import (modify date), and you can sort on that one date if you want.

Q: Can you filter by file extension, file type, or file size? Is this functionality embedded in you EDD system, or is it a Pre-Process?

A: You can choose to not import EXE and DLL when scanning files. Can also conditionally not process attachments. Users can drag-and drop files, add files, add a file list, or add a directory. Once imported, files can be sorted by file-type or file-size.

Q: Can you filter based on a list of known MD5 Hash values? Is this functionality embedded in you EDD system, or is it a Post-Process?

A: If you sort on MD5 Hash, that will work. If you export hash values and FileID's to a CSV file, you can potentially match an externally generated list of hash values with the exported values, and produce a list containing only the fileID's that match.

Q: Can you accommodate context searching?

A: Best to process metadata, export source, and use DTSearch to go through the contents.

Q: Please list specific Metadata fields you extract?

A: There are over 100 fields we support. To get a list of fields, go to the export dialog box, and open the 'Fields' dialog to see the complete list of metadata fields. You can also generate a 'report' from that dialog.

Q: What Metadata fields do you not extract?

A: None that we are aware of.

Q: Can you custom the Metadata fields order?

A: Yes, the dialog allows you to set the output order. You can also sort based on name, type, or description.

Q: Can you custom the Metadata fields names?

A: Yes, by clicking on the metadata field name twice, you can then edit the field.

Q: Can you print documents to Tiff?

A: We support Color and B&W Tiff.

Q: Can you print documents to color JPG?

A: We support JPG compressed TIFF as an export type. We are about to add support for 24 bit LZW compressed TIFF.

Q: Can you print documents to Searchable PDF?

A: Yes. You must first install the Postscript add-on. Then at time of export, we convert postscript to text searchable PDF.

Q: Are you able to include page breaks in extracted body text?

A: Extracted text includes page breaks. You can optionally include bates numbers in the extracted text (on a per-page basis).

Q: Does the output match the native file input for emails (both Outlook and Lotus Notes)?

A: Our recommendation is to assign DocID,s and make the output filename the same as the DocID. You can optionally name the exported files by the email subject, or original filename.

Q: Does the output match the native file input for emedia? Do you preserve formatting (e.g., graphics, bold, underline, italics)?

A: The TIFF file exactly matches the native file (same as if you print). The extracted text is text only (no formatting).

Q: Please list the custom settings for printing Excel files your system is setup for.

A: Set all worksheet to active before converting

  • Clear print area before converting (print all cells)
  • Clear Headers
  • Clear Footers
  • Set Orientation
  • Set Scale
  • Print Comments
  • Print Order
  • Print Quality
  • Paper size
  • Un-hide hidden worksheets before printing
  • Un-hide hidden cells, columns and rows
  • Export formulas to text file
  • Disable macros
  • Disable recalculation
  • Recalculate column widths
  • Recalculate row heights
  • Turn Grid lines On/Off
  • Turn row/column headings On/Off

Q: Can you include slip-sheets for files with exceptions?

A: Yes. The slip-sheets contain a printed version of the metadata for that file.

Q: How do you handle Autodate and AutoPath features in the body and Header/Footer areas of OLE files?

A: Currently we either don't update (use last saved values), or print the field-codes in place of the values. We're working on the ability to 'replace' AutoPath and AutoDate field codes with 'placeholders' before printing.

Q: Are you able to suppress current date when printing emails?

A: Yes. We also warn the user when exporting if any of the dates are today's date. The application can also be set to change the date/time to the last modified date before converting, ensuring that any macro date/times are correct.

Q: Do you OCR documents that do not have extractable text?

A: Yes. The feature is called 'Smart OCR'

Q: For native files, how do you deliver email and attachments? How are the documents extracted? Are the attachments extracted?

A: Exported MSG files contain the email and attachments. Embedded images are included in the converted TIFF file. Separate files are produced for each of the email attachments. There is an option to also produce the TXT/HTML/RTF message body in place of the MSG file if required.

Q: How is Parent/Child relationship maintained and presented? How are the grand-child level attachments presented?

A: Metadata fields are exported that keep track of the parent/child range values. You can also optionally assign Document Id's that reflect the parent/child hierarchy (0001.001, 001.002, etc).

Q: Can you export data to common litigation support software applications such as Concordance or Summation?

A: Support for Concordance (Opticon and Ipro), Summation, Ringtail, CSV and Zantaz Introspect.

Q: Can you produce IPRO (LFP) load files?

A: Yes

Q: Please describe your QC process.

A: There is a QC module that allows you to review every TIFF page produced.

Q: Describe your reporting functionality (exception, current status and progress reports).

A: Summary Report, report for every single operation that is requested. The XML project file can also be converted into a report (MDB or XLS).

Q: Is there some way to load a list of MD5 hash values into a project so we can remove any files from the project that were supplied prior to our purchase of DA?

A: You've processed files previously using another product. You want to take the 'hash' value of those files, and remove the matching file from Discovery Assistant.

Difficult to do. We use the MD5 hash algorithm, but that doesn't necessarily make the resultant hash values the same.

Other possibility is to search/sort based on filename, or subject. It would require following the process below:

  1. convert 'metadata' only.
  2. export metadata including FileID or DocID
  3. try to match up metadata fields with previous processed files.
  4. produce a list of 'matching' FileID's
  5. use the Select button to load in the list of FileID's into Discovery Assistant
  6. remove 'selected' items.

If you have the original source files of the data that was produced:

  1. Download the GlobalMaster product from the web site. (www.DiscoveryAssistant.com)
  2. Create a new project
  3. Load the source files (that have already been produced).

add the project into the GlobalMaster project to set 'global master' fields.

Then, add the second project to the GlobalMaster. The second project will now identify GlobalDuplicates. Anything that is a GlobalDuplicate shouldn't be produced from the second project.

Q: A project was tiffed and exported successfully. Upon review, client requested that a subset of the original list be restamped and rebates'd. If remove the un-required files from the converted tab, what are implications? What I don't want to do is start a new project.

A: Best to do a 'SaveAs' of the project. This will create a copy of all the tiffed images. Then, open the new project, remove the unwanted images, and export. If you ever have to, you can always go back to the first project exactly as you left it and produce more copies.

Q: We are working with another client and they are asking for the Oulook GUID’s for Email-ID & Message-ID. I looked the DA program and see a lot of fields except these two. Does the program support these fields?

A: Email-ID and Message-ID are the same thing (under CSV export this is called MSGID). For us to indicate a GUID, the MSG file must be in a PST file.

For whichever type of export you are doing (Summation, Concordance, etc.) sort the fields on 'Type' and look in the identifier category.

Sample MsgID:
00000000EB6B17A9A661A04681E50ED0AA1D185624002000

Q: Are file types determined by header/signature, or by extension?

A: Filetypes are determined by the file contents first, then by extension if we don't recognize the contents. To see what file types are supported, use notepad to open fassoctable.txt in the installation directory.

Q: At some point OCR of docs missing text (~3000 records) froze. Any ideas?

A: Can you confirm that the Microsoft Office OCR package is installed (Microsoft Office Tools / Microsoft Office Document Imaging? To confirm, run Imaging, open a Tiff file, and under Tools, choose 'recognize Text using OCR'. The default is to use Office Imaging. If Office Imaging isn't installed, then we use our own limited OCR engine (which may be the likely problem).

Q: PowerPoint crashed half way through a tiffing job, I didn't see it happen so I don't know why.

A: Am wondering if you can duplicate the problem printing using the same PowerPoint file? If so, run the logging feature (Logger button on top right), then send us the log file. That may give us a clue as to where the problem lies.

Q: Dialog boxes, such as Adobe version warnings and Office password protection, do not get answered, so these files all time out. This adds a considerable amount of time to the printing.

A: Dialog boxes For Adobe can be closed by training the program to recognize them using the Admin / Configure / AutoClose Dialog box.

If files are password protected, we currently don't have a password solution. What we normally do is timeout waiting to print, then close the application. If you send us a screen shot of dialogs we're not closing, we can investigate further.

Q: Is there a way to adjust the time out length?

A: Job Start timeout can be set from Admin / Configure / Timeouts tab. Print Job Start is currently set to 60 seconds.

Q: I spent quite some time setting up the default fields for Concordance so that they match the default email template that most litsup guys use at our clients. Then I launched an export with the Discovery Assistant template I saved the day before. I lost my work on the default fields as the export template overrid it. Is there a quicker way to create a default Concordance template that would include the Concordance email template fields + your additional fields?

A: The default export field names and default settings are stored the master file DiscoveryAssistant.xml in the installation directory.

When you create a new project, we use the default values from the DiscoveryAssistant.xml file to populate the "newproject.xml" file (now the currently active project).

If you change the names in the newproject.xml file, save the project, exit, and re-open the project, the changes will remain. (you can see them if you open and view the newproject.xml file with Notepad).

If you go to the Export Dialog, and do a 'save profile', you can save these most recent settings. By doing a 'load profile', we load the saved default values, and make them the 'new' defaults.
(Note: we prompt first before over-writing the existing items).

Q: How do I move a project sitting on the E: drive on one computer to the F: drive on the other computer.

A: The answer is to do a 2 stage copy process.

Need to identify a common network directory that both computers can link to, say drive s:

From first computer, do a SaveAS to drive S:
From the second computer, open the project on drive S:, then do a saveAs to Drive F:

At one point you will be prompted to copy across Source. If the location of the source files are not the same between the two computers, then you need to 'copy source' when prompted.

Note: we try to keep the 'last saved' dates of the copied source correct.

Q: How do I export Just OCR'ed text?

A: To export just OCR'ed text, open the project, select 'smart OCR', create the OCR text, then re-export, but choose 'NULL' for the export image type and select 'text files' ON.

Q: What is the best quality OCR available?

A: For good OCR results, you need to install and use the Microsoft Office Document Imaging application with OCR support (uses Omnipage from Nuance). The Discovery Assistant built-in OCR engine provides only minimal capabilities.

Q: How do I import bates stamped TIFF images for export so as to take the minimum time possible?

A: If you want to pass-through TIFF images, goto Admin / Configure / Options Tab, and at the bottom, select 'enable no-print convert on image files'. Now when you go to convert TIF to TIF, we do a direct pass-through, but still count the pages, and create the appropriate MetaData. Still have to Queue from the convertible to the Queued, then select Convert, but the process now goes extremely quickly (the time required to do a file copy).

Q: How do I get a list of global duplicates across multiple projects?

A: Add all the projects to globalMaster, then open each project, and export to CSV. You need to export the following fields: GlobalMaster, GlobalCount, DocID, DupPaths, and Hashcode. Output filename is DocID. Tiff type is NULL.

Next, Load each TXT file produced into a separate XLS sheet.

  1. Sort on 'Global Master Count' to differentiate between files that have duplicates, and files that are unique.
  2. Copy those items for which Global count is > 1 to the clipboard, and paste them to the designated 'global duplicates' tab. Each input file is appended to the output global Duplicates tab.
  3. Sort the values in the global duplicates tab based on HASHCODE. This groups duplicates together. Second sort key is 'global primary'. Third sort key is DocID. Global primary = true is now the first instance of the document. every other item below that is a duplicate. Each new 'global primary' marks the start of a new set of duplicates.

Q: I've got a TIFF file that does not convert properly (come up all black), but yet It looks fine in Microsoft Imaging.

A: Some TIFF files can only be read by Micrsoft Imaging. If a TIFF file fails to convert properly, suggest trying the following:

  1. Identify the files you want to convert. These will either be in the convertable tab, or the non-convertable tab.
  2. Select the 'Assign Type' button. Assign type for selected TIFF files to .mdi - Microsoft Office Document Image File.
  3. Requeue, and convert.

If you want to make Microsoft Imaging the default conversion application for TIFF files, then in Admin / Configure / Documents, under TIF and TIFF, open the Modify button, and click on the 'copy From' button. Can then choose '.MDI' as the file type.

Q: if a client does not want to tiff images from a pst file - can we produce files in native file format from a pst?

A: Discovery Assistant can be set to export just metadata and native source files from the processed pst file, without having to TIFF. Export file types include load files for Summation, Concordance, Ringtail and CSV.

Q: When I configure the Outlook rendering format to "Outlook MSG" and print Outlook Meeting Items, I often get the attached message. If I set the rendering format to "Default" then I don't get the error.

A: Normal behavior is to use 'default'. In this case we extract the message as HTML, RTF, or TXT, then use the appropriate application to print.

Outlook is normal capable of printing direct to the printer, but every now and then it croaks. We keep thinking that Microsoft is going to fix this problem, and believe they do in fact have it working if you have Office 2007 installed. We've kept support for 'Outlook msg' in the program mainly for trouble-shooting purposes.

Our recommendation is to stick with using 'default'.

Q: How do you handle documents that have errors? For instance, I had an Excel file that DA reported as non-convertable. I opened the file using the "Source" feature within DA and Excel reported that some data was unable to be opened due to errors. When I clicked through two prompts, there was indeed data within the spreadsheet. I then saved the spreadsheet as the same name in a different location, deleted the original, and ran "Recheck" on the file and now it is convertable.

When DA encounters an error does it immediately deem the file as non-convertable? If so, is there a way to have DA analyze those further to get them to come in? IE: bypass prompts?

A: If there are printing errors or file open errors, we normally time out. The failure document then ends up in the 'failed' tab. The idea being that a human being can then identify the root cause.

If the root cause of the conversion problem can be solved through automation, then that can be done using the Admin / Configure / AutoClose feature (train it to close dialogs). Otherwise, the document can be manually saved, and re-converted (over-writing source, or TIFF file substitution from the QC module). The failed document can also be passed-through using the 'pass-through' feature.

Q: How do I print HTML files that are wider than a standard 8.5 inch page without loosing anything off the right hand side?

A: Need to change the following printer properties of 'ImageMAKER XDC Servic1' printer:

  • Printer Properties / General Tab / Printing Preferences, set to Landscape.
  • Printer Properties / General Tab / Advanced: set the paper size to be legal.
  • Printer Properties / Device Settings - Advanced settings, set the orientation to ‘leave image orientated landscape under Advanced Settings / Handle landscape pages.

Q: Can you support bulk conversion from these repository types: Lotus Notes, Windows file system, and MS Exchange?

A: Discovery Assistant supports converting Notes (NSF), Microsoft Exchange (PST, MSG) and Microsoft Outlook (EML) files. Can also scan in a directory (Windows File System), or single files for conversion.

Q: Which mime-types can you convert to PDF (e.g. Text, MS-Word, JPEG)? A URI to your supported mime-type list is acceptable.

A: We use the native application to convert the file. We launch Word to convert .DOC files, Internet Explorer to convert HTM files, and Acrobat Viewer to convert PDF files. When installing Discovery Assistant, we look for what file types are convertible, and list them at the end of installation.

Q: Do you support searchable meta-data in the PDF (e.g. mime-type, title, subject, author for MS-word, and every Lotus Notes available field)?

A: We extract metadata separate from the image data, and export it to a CSV file as part of the export. (Tab delineated file). We extract and track 99 different metadata field items.
spec sheet: http://www.discoveryassistant.com/Nav_Top/TechNotes.asp

Q: Does the PDF conversion closely resemble what would appear in the native document interface (e.g. PDF looks similar to what MS-Word displays for a MS-Word document)?

A: Output looks exact. We use the native application to print.

Q: Do you support Lotus notes parent-attachment sets? Does your product either merge these into a single document or have children point to the attachment or vice versa?

A: Discovery Assistant keeps track of parent/child relationships. This information is output in the Metadata. Attachments are produced as separate documents.

Q: Do you capture all textual and graphical information in the conversion to PDF? If not, list what is not supported (.e.g. embedded JPEG images).

A: If the document can be printed, then all graphical information is in the output PDF file.

Q: Do you have OCR capabilities to transform embedded images into a searchable PDF?

A: Two conversion options exist:

  1. Convert to color or B&W image, and export scanned PDF plus OCR text. (if original source document is text based, then we extract TXT directly without OCR). If the source document is an image, then we use the installed Microsoft Office OCR engine to convert to text (Omnipage engine).
  2. Convert to searchable PDF (no extracted OCR text). (requires downloading and installing additional modules).

Q: Can Discovery Assistant open files from Thunderbird which is a email account management tool? If so how do we accomplish this because we could not open the PST. files that we extracted from the Thunderbird Account. Let me know.

A: Two things to check:

  1. Can you open the PST file using Outlook?
  2. Also, can you open other PST files using Discovery Assistant?

For more details on the error, need to run imglog.exe (red button, top right of the application).

Q: I'm getting an Error Message: "Ambiguous Name Detected" Appears when converting a Word file. tmp.dde or tmpdde)

A: See article Q165860: (Microsoft Knowledge Base Article - 165860 )
http://support.microsoft.com/default.aspx?scid=kb%3ben-us%3b165860

To resolve this problem, delete the Tmpdde macro from the Microsoft Word global template (Normal.dot) file.

To do this, use the following steps:

  1. Start Microsoft Word. On the Tools menu, point to Macro, and then click Macros.
  2. Select any macro with the name Tmpdde and click Delete. Click Yes to confirm that you want to delete the macro.
  3. Click Cancel. Close Microsoft Word.
  4. Close all Microsoft Office programs and mail or fax programs. Restart the programs and re-send the fax.

OR

  1. Look for and re-name any occurrence of Normal.dot on your hard drive.

NOTE: Renaming the file Normal.dot removes any user-defined styles, AutoText, Toolbars, and Macro Project Items that have been saved in that template. You can use the Organizer to copy those items into the new Normal.dot that is automatically created.

Q: I've been attempting to convert from .msg to PDF and having a hard go of it - it only converts to TIFF. We're currently hunting a solution to batch-convert msg to text PDF, but to no avail.....

A: For scanned pdf, the Discovery Assistant application would be your best bet. The output can be 'exported' as scanned PDF, and named based on your own defined naming rules.

To export searchable PDF, you can still do this from Discovery Assistant, but need to download the Postscript add-on, and then at time of export, we use Ghostscript to convert from postscript to searchable PDF. (Add-on is part of the download tools on DiscoveryAssistant).

Q: Is there a project size limit in Discovery Assistant?

A: Discovery Assistant works best at 100,000 items (or less) in the project. We've tested up to 500,000 items on a 1 gig machine, but we see speed degradation due to the time required to to periodic saves.

If you are handling +100,000 items, our recommendation is to use the Discovery Assistant TeraBite application (separate download) that can enumerate all files (uses a MDB file), and can break those files down into multiple load lists (limited by total size, or total items in the list).

If you break a project across multiple Discovery Assistant project files, then you can still do global deduplication using our Global Dedup tool (also a separate download).

Q: Is there a way for clients to come back to us with a list of Doc IDs that they want Bates numbered for production? How would you import that list of Doc IDs and then Bates number only those documents?

A: If you assign DocID's, and export those ID's as part of the exported load files, then users should be able to send you back a text file (or spreadsheet, or csv file) containing a list of files to stamp and produce. That list of Document ID's can then be loaded as a selection set when in the Converted Tab (see Select button / by Document ID list). You can then 'toggle' that selection set, and assign Bates Numbers and Stamp just those items.

Note: you can also select 'parents' of selected items, and 'children' of selected items. This allows your customer to tell you to produce a list of files, along with it's parents / children / siblings.

Q: Can Discovery Assistant Produce native load files without tiffing the documents?

A: Queue for conversion 'metadata only'. That will allow you to produce source documents without TIFFing. Most metadata except for extracted text is available (note: text from MSG files is still extracted).

Q: How long will it take to process a GB?

A: One GB takes approximately 20 hours of processing time, and produces 70,000 pages. (1 second a page conversion time, and 1 second a page for import/export/deblank/bates numbering, etc).

Q: Can you handle email from Eudora and Thunderbird?

A: We can handle native MIME email (similar to how Outlook Express is stored) and we think that is how Eudora and Thunderbird messages are stored. If some tweaking is required, we're willing to do the work.

Q: Customer needs Multi-page TIF, metadata and bibliographical data in text delimited format with field headers [associate with doc id number as well]. All parent IDs and attachment IDs must be captured.

A: If you export, and choose Character Separated Text, then under Options, select all - That gives you the full set of metadata fields that we extract and export, including Parent ID and Attachment ID. We can extract limited bibliographical data from Office, Excel, PDF, and email.

Q: How would I set output to Eastern Standard Time, non-military format?

A: We read timestamps as UTC, then convert to local time. Set the machine you are working on to Eastern Standard Time, non-military format, that is how we will produce the date/times.

Q: How do you handle corrupt or password protected files? Placeholder file? Exception report but do not process? Placeholder File; set aside and notify us.

A: Current solution is to fail on first attempted conversion/import. Failed items can be 'copied', unencrypted/unlocked, then re-imported. OR you can do pass-through (placeholder) for these items.

Q: How do I export in chronological order, earliest to latest date?

A: When exporting, users normally ask for parent/child sort order. You can also sort and export based on date/time. If you are planning on sorting on date, then best to first set the date format on the machine as YYYY-MM-DD. All formatted dates will then be sortable.

Q: We also need a blowback set for our customer's office.

A: Do you mean hard copies? You can choose to print converted TIFF files (either stamped or unstamped) from Discovery Assistant.

Q: How do we export to CD?

A: You can export VOL\BOX, limit the output size to CD, then burn a CD from that exported data.

Q: When viewing image and text side by side, when I scroll the tiff, shouldn't the text scroll as well?

A: It's an interesting idea, and will add it to our things to look at. Basic issue is we don't know where the TXT is relative to the Image. All we know is this is what printed to get that TIFF file. We could still probably scroll the text as you scroll the image though, and hope that we get it right.

Q: Can you print the "all Files" Screen so that you can have a hard copy list of file ID, Bates, Name, Status. Hash etc etc- or can you export that to csv file?

A: You can 'copy' to clipboard, and paste into an EXCEL spreadsheet. Other alternative is to use our DAReportManager reporting tool that can load the XML project file, and convert to XLS.

Q: How do you actually select a file from list? I highlight it, and then click a command such as convert, and it says no files are selected do you wish to select all files.

A: The convert button has an 'arrow' beside it that shows a pop-up: Choices are: All, Selected.

Q: What is the mtf file used for, Is it simply a txt document that the attorney can have to see all the metadata fields in an easy to read format?

A: The mtf files stores the metadata in before we do an export. Our thinking was when we put in the search feature, we can search the MTF file without first having to do an export.

Q: Is there a way to see a dedup report of files that were duplicates? I think there is, but I didn't see it.

A: If you sort on the 'duplicated' key, then that gives you a list of all files for which there is a duplicate.

If you run the project file through Global Deduplication, then that will identify the global primary file, and Global count.

On any particular duplicate, you can list it's parent, it's siblings, or it's duplicates by pressing one of the action buttons in the button bar.

Q: Can you tell me what filtering technology you use (Native/Mapi, Verity, Stellent, or Open Source) for the following:

A: We use the following methods to extract files and metadata:

  • PST Extraction
    • we've written our own extractor, that in turn uses the Office OLE object.
  • Lotus Notes Extraction
    • own tools that in turn uses the Lotus Notes.dll files.
  • Office file metadata extraction
    • use Office OLE where appropriate.
  • Other file types metadata extraction
    • PdfDump (our own tool), all other file types we use standard windows API.
  • Imaging
    • default for TIFF, DCX, PCX, PNG, JPEG is our viewer. We could extract additional info, but currently using Windows API (create / modify / accessed date-time, size).

File type is determined using binary data to get an extension type. Extension type is then translated through the operating system to get a file type.

At the moment, we do not include an integrated search. However, we do support extraction of source, tiff, text, and metadata as separate files. Search tools can comb through these files looking for matches.

For conversions we rely on the following applications being installed:

Microsoft Office, Lotus Notes, Windows Internet Explorer, Adobe Acrobat Reader, and any other viewers necessary to open and print native files.

Conversion are done using the PrintTo interface. Custom converters exist for Word, Excel, PowerPoint, and internal Office / Notes file formats.

Q: Do you know what the limits are on the number of files you can process at a time? I have 3,295 files. I can't get it to load more than a couple hundred at a time. Thanks.

A: You should be able to load up to 50,000+ files in a single project. Suspect the problem is that you are using the 'Add Files' dialog. To load in a directory of files, use the 'Add Folder'. Optionally, you can open explorer, and 'drag' the files across.

Finally, you can construct a list of files, and 'Add From List'.

Q: When I go to convert, there are no output file format types (list is blank) in the Convert dialog.

A: It looks very much like an incorrect installation. The quick fix is to email us the file:
"c:\program files\imagemaker\Discovery assistant\install.log" and we'll try to figure out what went wrong.

Q: My lawyer wants all messages and the attachments to be exported in one TIFF file.

A: We can't quite do that, but here is something that comes close:

To itemize what files belong to what message files:

  • assign Bates Numbers and Document ID's
  • messages with attachments are visually identified with a Bates Group Range value.
  • optionally bates stamp the documents.
  • export stamped or unstamped files.
  • output files are named by DocumentID
  • output a CSV file that includes limited details, including Document ID, and ATTACHMENTRANGE. (may also want to include subject, or title).

Q: I'm having a problem processing MSG files. The Outlook dialog comes up, gets dismissed, but the document does not process.

A: Possible fixes:

Am wondering if I can get you to temporarily turn our 'clickyes' program OFF. Can do this by renaming the registry item: HKLM\Software\ImageMAKER\DA_ClickButton. (add an extra character to the name).

Should now be able to manually close the dialog (and allow for 10 minutes of activity). If this change works, then we'll look further into what the problem might be.

There is an alternative 'close' tool called 'ClickYesSetup' that you can install from the Start / Programs / ImageMAKER Discovery Assistant group. If that is installed, and set to active, and it solves the Outlook Dialog problem, that would be another solution.

... also, if you have any other third party software that is set up to close these outlook dialogs, that too could explain the problem - two applications are trying to close the same dialog.

Q: How do I exclude Outlook attachments from being processed?

A: Under the Options / Scan tab, set 'exclude outlook attachments'. When you add in a MSG file, we now ignore all attachments.

Q: How do I control the formatting of the MSG file to look like it was printed from Outlook?

A: Default behavior is to extract HTML / RTF / TEXT, then use the native application to print that file.

To get 'outlook' formatted output, from Options / Outlook tab, change rendering from 'default' to 'Outlook MSG'.

Q: Can we generate a report or export to a .csv / spread sheet of all files that couldn't be converted, non-convertible or failed, that contains the file name, path, etc.?

A: If you download DAReportManager, that will give you the ability to create a spreadsheet containing all the information from each of the tabs. (contact ImageMAKER for download instructions).

Q: How do I handle foreign character sets (like cyrillic)?

A: As far as we know, Summation and Condance do not support Unicode. For this reason, metadata and extracted text are exported as MBCS (multi-byte character strings), which can be handled by Concordance/Summation.

The TIFF files print using the native application. If the native application supports foreign chars sets, then those characters are properly represented in the TIFF file.

To set up for MBCS output:

  1. Go into control panel and select 'Regional and Language Options'

  2. Go to 'Advanced' tab.

  3. In 'Language for Non-Unicode Programs' drop down list box choose 'Russian' for cyrillic. (Other language choice for alternate character set).

  4. In DA set Outlook rendering format to 'default'.

  5. Start the conversion process.

Adding Arabic language support

the user did not add Arabic language support during installation wizard, after installation completes the user can add the Arabic language support by the following steps:

  1. Double click the Regional and Language Options from the Control Panel folder.

  2. From the Languages tab select the check box “Install files for complex script and right-to-left language”.

  3. From the Regional Options tab select the required Arabic language local and location required.

  4. Press OK button.

You should be able to review the extracted TXT, and metadata using Notepad. Text data is MBCS encoded.

The following MBCS character sets (code pages) are supported:

  • 874 Thai
  • 932 Japan
  • 936 Chinese (PRC, Singapore)
  • 949 Korean
  • 950 Chinese (Taiwan; Hong Kong SAR, PRC)
  • 1200 Unicode (BMP of ISO 10646)
  • 1250 Windows 3.1 Eastern European
  • 1251 Windows 3.1 Cyrillic
  • 1252 Windows 3.1 Latin 1 (U.S., Western Europe)
  • 1253 Windows 3.1 Greek
  • 1254 Windows 3.1 Turkish
  • 1255 Hebrew
  • 1256 Arabic
  • 1257 Baltic

Q: I'm wondering if you can give me some additional insight into why there are so many blank pages in this spreadsheet. If you look at it in print preview in Excel, go to page 19 and you will find 19-29 blank. Also 49-60, 79-94, and I quit looking at that point. As far as I can tell, there is no data. Is there something going on with formatting, maybe?

A: In spreadsheets, users usually set the 'print range' of active cells to print - and that range usually contains data. However, the print range can miss huge swaths of the spreadsheet - and it's the swaths that we want in the discovery process. Discovery Assistant defaults to printing the entire sheet, blank pages and all, which is defined to be the largest box that contains the top left cell and bottom right cell.

Discovery Assistant spreadsheet formatting settings are controlled through settings in the Admin / Configure / Excel Options page. The following settings are supported:

  • Set all worksheets to active before converting
  • Clear print area before converting (print all cells)
  • Clear headers
  • Clear footers
  • Orientation
  • Scale
  • Comments
  • Order
  • Print Quality
  • Paper Size

Q: Quick question on the hashes that DiscoveryAssistant produces. A client of mine is insisting that the MD5 hash does not have any dashes in it, though the hash values DiscoveryAssistant produces does. If the dashes are removed via search and replace, would it still represent the correct hash value?

A: Yes, it's still valid. We put the dashes in so the number is human readable.

One other point - you can customize how much of the file to hash. We then binary compare if we get any matches. If you are doing global deduplication using the hash value, make sure the Options / De-duping /HashCode sample size is set to 0.

Q: The word document I am converting contains 2 pages, but when I process I only see one page.

A: If you see only one page in the TIFF file, and that page contains everything that the original doc contains, then I have a very good explanation...

When switching print drivers, word documents reformat slightly. You can see this when changing the default print driver in Word, with a Word document open. The text position will change slightly.

The same file printed to an HP printer may produce two pages, and when printing to a Postscript printer come out as only one page.

To get exact equivalence between the two printers, we need:

  • to be using the same font set. Postscript fonts are different from True type fonts that we use.
  • print margins on the printers must be the same.
  • resolution must be the same.

Q: What is 'child next' order when assigning Bates Numbers. Also, do you ever assign the same bates number to more than one document?

A: We load the files and attachments in a slightly different order from how case management systems request them.

We load in the first layer of attachments, then go back looking for attachments to the attachments. Understand that the 'proper' way of importing/exporting is to list the first attachment, and it's children before listing the second attachment. For lack of a better term, we use the name 'child next' order.

To get the right export order, we recommend assigning Bates Numbers in child next order, then sort afterwards on bates number. Then export. You'll note that we also fill in the bates range for each message at this point. (allowing you to confirm that everything is listed in the right order).

As for handling duplicates, you have 3 choices at time of conversion:

  • ignore (or copy) duplicates.
  • skip duplicates
  • link duplicates.

If you ignore (or copy )duplicates, then each file gets it's own bates number. If you skip duplicates, then only the first 'converted' file gets assigned a Bates number. If you link duplicates, then multiple 'duplicates' will all be assigned the same bates number.

Q: What process is used to identify whether a file is readable. Is a listing such as the NSRL used? Does it utilize the extension of the file or does it extract the file to determine what it is (i.e., if an Excel spreadsheet has a .AAA extension instead of .XLS extension, will it still recognize it as an Excel spreadsheet and treat it accordingly?

A: Process by which we recognize the file type is:

  1. Use the contents of the fAssoctable.txt to do a first pass through. This checks for file types by inspecting its contents. If a document has extension DOC, but is really a zip file, then we mark the file as a zip file.

    If the file type isn't recognized, then we keep current extension.

  2. Check if there is a file association set on the computer for the file type. If set, then we assume the file type. If no file association found, then we route the file to the 'unconvertible' tab.

  3. Attempt to do a conversion on queued files. If any conversions fail, then queue those files in the failed folder.

  4. Human operator can review the unconvertible files, if the file type isn't supported, users can look up file type at: "http://filext.com or http://www.nsrl.nist.gov/" Then acquire the owner application, install that application, and then do a 'recheck' to re-check unrecognized files.

    Operator also has the option of manually assigning file type. Otherwise, files can be sent through using the 'pass through' feature.

  5. Human operator can review failed files, and manually 'pass through' those you want to keep. [Just noticed there is no way to manually assign type to failed files].

Q: There is a process that a lot of law firms are doing and I was wondering if you had a way to accomplish this. In order to save money the law firms will request that we only deliver the metadata and OCR, no images. Then, they tag the responsive docs they want and ask us to only TIFF those. Any ideas?

A: When converting, you can choose:

  • MetaData only (requires no conversion, and is very fast).
  • MetaData and text (requires conversion, but we don't spend the time to produce a tiff file).
  • MetaData, Text and TIFF (no need to do this just yet).

If you are dealing with just email, the email text is extracted as part of the MetaData. That might be good enough for your needs (and is fast). Only time you would want MetaData and text is if you want the text contents of attachments, and the text contents of loose documents.

Once you've done the conversion, you can export that data to Concordance/Summation/IPRO/CSV for analysis. Trick is to remember to extract the FileID as one of the export fields.

User imports the data into Concordance/Summation/Case Management System, analyses it, then exports back out a list of documents that they want Tiffed. That list of items needs to include FileID as one of the record items. Using a text editor, or Excel, you should be able to remove any extra data, creating a TXT file with just one FileID per line.

Next, re-open the Discovery Assistant project, create a copy of the project (save as), then remove all converted/queued items. Then, from the Menu, select: Project / Queue from FileID list. Open the FileID list you just created, and only those items with FileID's specified will be queued for conversion.

Convert, bates stamp, export, and you're done.

One advantage of this method... is the client needs to use your services twice, once to get the metadata with FileID, and the second time to get the TIFF files. It would be difficult for the client to identify what documents to process without getting you to first do the extraction. If the client were to give you loose files, they run the risk of loosing the file relationship information - and potentially changing the data as items are extracted from the original source documents.

Q: What caliber of person is required to competently operate the DA software?

A: The product needs to be set up and run by an IT knowledgeable person (2+ years experience).

Most problems that we see are 'setup' problems - getting the product up and running in a production environment. Startup issues we see are:

  • not enough memory or hard drive space.
  • needs additional programs installed.
  • problems with existing installed programs.
  • learning the product (there is a lot to review).
  • understanding the conversion process.
  • tricks and tips.

Once the product is up and running, relatively junior people with little or no IT experience can run it. If a problem comes up, then the senior IT person needs to look at the problem, then call us if it can't be solved locally.

Q: What training programs do you offer?

A: A person with IT experience can get the product up and running without any additional training on our part.

That IT person in turn needs to train the user, and be available to answer any user questions. User training by anyone other than the IT person is difficult, as it is the IT person who is going to have to answer the first line questions, and who needs to do the re-installs and first-line trouble shooting.

Q: What database architecture are you guys using, SQL? Access? And do you have any sort of distributed processing for large jobs that I may want to spread over multiple machines?

A: We use an XML file loaded into memory to get the fastest possible database processing speed. Maximum practical size of XML file is 500,000 files.

To handle data sets larger than half a million files, we provide you with an add-on tool we call Terabite that builds an MDB file containing millions of files, then export this list to multiple 'Discovery Assistant load files' by breaking the list down by bytes (2 Gigs), or number of files (100,000) - both numbers are user configurable.

To set up multiple jobs, break the data into smaller projects, (possibly using our Terabite program to automate the process), then load and process multiple projects on multiple machines. As long as the source files are referenced with a \\Server\share UNC name, it doesn't matter what machine is used to do the processing.

In addition to the Terabite tool, we also provide a tool to map XML project files back to XLS or MDB to facilitate reporting.

Q: What are the limitations as to the maximum number of files to be processed in a batch?

A: Our current recommendation is to keep the number of files per project batch to less than 500,000 files. When processing the batch, to optimize speed we load the file list into memory. As the list size grows, the time to load/update/save the list gets increasingly longer. 500,000 files seems to be a practical limitation. There are no physical limits on files sizes, or number of pages per file.

Recognizing that different data sets have different file densities, the rule of thumb we use is each gigabyte of data translates to 70,000 converted pages. At an approximate conversion speed of 1 page per second, a single copy of Discovery Assistant should be able to process 3,600 pages an hour, or a gigabyte of data every 20 hours.

Our recommendation is to keep the lists sizes to 100,000 files, or a 2 gigabyte maximum. That keeps job processing at less than 24 hours per job.

Q: How do you process word files that have mark-ups with in the doc file?

A: The default conversion process converts the document similar to how it was saved. If markups are displayed when the document was last saved, then the markups are printed. There is a difference in how the markups print based on whether you are using Office 2000, or Office 2003. Office 2003 prints much more information about markup changes than Office 2000.

Q: How do you handle password protected files?

A: If the file is password protected, our current default behavior is to time out waiting for the application to print. We then kill the application. The default timeout value is 30 seconds. If there are a lot of password protected files, then conversion is going to go very slowly.

Failed files can be 'moved' to another directory, and then set up for password cracking. Our understanding is that cracking a password can take multiple hours per file, and not something to try in real time.

At some point in the future, we'll look at trying to determine if a file is password protected before attempting the conversion.

Note: there are a number of 3rd party applications designed to handle password detection and cracking for: Excel, Access, Word, RAR, PDF, Outlook.

Detection:
http://www.ozgrid.com/Services/find-protected-files.htm

Cracking:
http://www.ozgrid.com/Services/access-password-recover.htm

Q: What are the benefits of paying Maintenance and Support?

A: Having paid up maintenance ensures that you have continued access to developer support, and that we can help you with any problems that come up. We understand that your business is to convert documents to TIFF for your client, and that if you have problems, you need them fixed as quickly as possible. If you have specialized requirements, our developers are also available to do custom development.

Q: How do I add support for GIF files?

A: On Windows XP and Windows 2003, the Windows Picture and Fax Viewer can do the job. To set the default, go into explorer, do a search for GIF, then open a GIF. At that point, the file association will be set. Can then do a re-check from Discovery Assistant, and the GIF files will be convertible. Same process for JPEG.

On a Windows 2000 machine, run the Imaging For Windows application, and set the menu item: Tools / General Options - open images in Imaging.

Q: How do I change from using UltraEdit back to making Notepad the default TXT viewer?

A: Ultra edit file associations can be difficult to over-rule, especially if you are looking to use Ultra Edit for other purposes.

Fix is to go into Discovery Assistant / Admin / Configure / Documents
Look for .TXT, and select 'Modify'.

Next, put the following command into the 'override cmd' edit box at the bottom of the dialog:
%SystemRoot%\system32\notepad.exe /pt "%1" "%2" "%3" "%4"

Note: Make sure that Notepad File / PageSetup has the header/footers removed....

Q: What is the best setup? workstations writing to a central SQL DB? Workstation with discovery assistant and SQL local?

A: Ideal setup is as follows:

  1. project is broken down into manageable 1 gig bites (less than 200,000 files)

  2. files are loaded into a discovery project file. Ideal is data resides on a server, and references to that server are UNC based (server\share).

  3. project files are converted on one or more machines.

  4. problem files are dealt with.

  5. Bates numbers are assigned starting with project 1.

  6. converted files are exported out to a standard load file format.

  7. can use a separate application to convert XML project files to XLS spreadsheet files for reporting and record keeping as to what got converted, what failed, what the source directories are, etc. (I need to send you this application - Will be incorporating it into the Discovery Assistant drop next).

  8. load files are loaded back into a SQL database.

Discovery Assistant native database is XML (flat file). We load the whole file into memory, and all database operations are fast. At one point we modeled using an SQL database from within Discovery Assistant. However, the speed of access was slow. Made it difficult to 'assign' large groups of files to different status values, etc.

Q: What are the best practices for planning a multi-batched project?

A: We've been developing the tools to manage terabytes of data. Believe best practices work as follows:

  1. enumerate in a TXT file (or simple flat database file) a list of all files to be processed.

  2. Break this file into smaller 'load files' based on a maximum number of items (50,000), or maximum total cumulative file size (1 gig).

  3. Load these TXT files into DiscoveryAssistant to build fully functional project files.

  4. Do the conversion of the project files

  5. Export the project files as Load files

  6. Load the converted files back into the Discovery application of choice.

We've found that by de-coupling the process from an SQL database, and using XML load files managed entirely in memory, that our ability to access data, compare files for duplicates, and manage project queues of 50,000+ individual files - is significantly sped up over traditional database access times.

Q: What is the procedure for merging batches and renumbering?

A: Quick answer is that you need to first convert before assigning bates numbers.

Am suggesting that if batches are numbered, and delivered in that order, the process of bates numbering can occur concurrent with the delivery of petrified data. Multiple batches can be converted at the same time. Batch 2 can't be bates numbered until Batch 1 conversion is complete.

In the event that files do have to be re-done due to a customer request, there are a number of options that allow you to set bates numbering to what ever start number is decided upon.

Q: What QC functions can be run on processed data while another batch is being processed (on the same machine) if any? or does the current batch need all local resource to be as fast as possible?

A: XML files are used to define the project. If you want to QC a project as it is being converted, then you have to do it in the active conversion session. If you want to QC a project that has already been converted, then this can be done on a second machine. User can open the project file, then view the converted data.

You can install as many 'QC' versions of Discovery Assistant as you like, for no extra charge. The QC versions do not convert, but can 're-queue' files to be converted.

Q: Keyword Searching?

A: Not something we currently offer. Best to export the document MetaData to Concordance or Summation, and then work from there.

As an alternative, it is possible to complete the conversion, then use Google Desktop to index the resulting files.

Q: What are the settings for TimeOuts in Discovery Assistant?

A: Default values are (in seconds):

HKLM --> Software\ImageMaker\xdc\Settings
TimeoutFirstPage
INT --> 120
TimeoutNextPage
INT --> 120
TimeoutStart
INT --> 60
TimeoutTotal
INT --> 600
TimeoutMaxPages
INT --> 0
TimeoutPrintQueue
INT --> 60

If you want to make the timeout infinite (for really large multi-page files):

Discovery Assistant / Admin / Configure / Timeouts
Set Timeout Total to 0 (infinite).

If you are running really large files, we find that the spooler sometimes runs out of room. If so, need to set the print driver Advanced Settings (Start / Settings / Printers / Properties / Advanced tab) change from "spool print documents" to "Print directly to the printer"

Q: When Discovery Assistant screens date macros from putting in today's dates in Word and Excel documents, what date if any does in fact appear in the petrified image? Is the date macro removed entirely and no date appears? does the date of last modified appear?

A: We switch the machine date to the date of the image being rendered. (date it was last saved). The macro still runs, and the date gets filled in. It's difficult to disable ALL macros. It's also misleading to not put in a date.

When we output Word or Excel documents we set UpdateFields=false (using WordPrintTo.exe, ExcelPrintTo.exe) so date fields don't get changed from their last value. This operates independently of the feature that temporarily changes the system date (enacted through the admin interface). Usually changing the system date is only required if the conversion process produces headers/footers with the date/time and these are required to reflect the last saved date/time. Headers/Footers can usually be turned off for most documents by opening the parent application (Word,Excel, etc.) with a blank document, selecting PageSetup from the file menu, and removing the header/footer.

NOTE: for this to feature to work, you must turn ON the Date Handler in Discovery Assistant. To do this, go to the Admin dialog, select Configure, and under the Options tab, select: 'Reset System Time to LastWrite Time before conversion.'

Warning: with this option turned on, do not use this machine during conversion for other business functions.

Q: My PST file contains 27,000 emails, each of which contain a signature file. In total there about 10 different signature files. When I go to export to Summation or Concordance, for each signature file reference, the list of duplicates is enormous. The DII file itself is larger than 80 Gigabytes. What can I do?

A: Go back to the AllFiles tab. You should be able to locate one or more of these common signature files. Identify the FileID and Hash value, then sort on hash value. Once you've re-located the signature file in question, delete all the copies of this file from the data set (OK to leave one). You may want to make a note of the number of copies you've deleted. Do this for any of the other signature files that are a problem. Then, go back and try re-exporting. The Data file should now be much smaller.

Q: Recently, my system has been crashing more often, due to a process or program "CiceroUIWndFrame".

A: I found out that this is the "Speech and Handwriting Recognition" part of Office XP. To de-install it, go to :

  1. "Control Panel"

  2. "Add/Remove Programs"

  3. "Microsoft Office," click on the "Change" button

  4. browse to "Office Shared Features," "Alternative User Input," and select for Speech and Handwriting Recognition (both) "Not available" from the drop-down box.

  5. Perform the change and the CiceroUIWndFrame messages disappear. Took just a second, and you don't even have to restart your system or insert the XP Office CD

  6. Under Win20003, Control Panel applet is Text Services. You want to remove the Handwriting recognition services.

Q: If I have a job already stamped, but don't want to print the whole thing, is there a way I can select which pages to print? What happened was the computer and/or printer caused the print job to stop midway through (I assume the software won't cause this).

A: To print just one file from the list:

If you 'double click' on the image in the stamped tab, that will bring up a viewer (ours).
You can then 'print' that image by specifying which pages you want printed.

To print all pages of multiple files in the list:

Select which files you want to print from the list. (Shift Click, or Ctrl Click), then select the Print button. It should only print those files which you've selected.

To print selected pages of multiple files in the list:

Double click on each file, and use the viewer to print.

Q: Any information you have on performance, e.g. number of files converted per minute?

A: In regard to speed, there are no hard numbers. We tested 7 WORD files (simple graphics, lots of text) with the following page counts: 3, 71, 3, 5, 16, 3, 204

3.2 GHZ machine, no hyper-threading. 1 Gig of memory and big hard drive.

Output DPIOutput FormatPages per MinutePages Per Minute without last file:
300G48470
300G38770
300G3257206<-- dithering set to Windows Fast Dither.
200G4150130
200G3150130
200G3332270<-- dithering set to Windows Fast Dither.

At 300 dpi, pages per minute ranges from 70 - 84. For smaller file sizes (one page per file), they have been seen to go as low as 30 pages per minute.

At 200 dpi, pages per minute ranges from 130 - 150 per minute. For smaller file sizes (one page per file), conversion speed can go as low as 30 pages per minute.

For files containing a large number of pages, conversion speeds are somewhere above 200 pages per minute.

Conversion speed can be increased dramatically (doubled) if you switch to Windows Fast Dithering as the dithering option (Printing Properties / General Tab / Printing Preferences / Advanced / Image Rendering Options - color mode).

Basic trend:

  • Speed is greatly enhanced by setting the default dither output to 'Windows Fast Dither' (image quality can be slightly compromised).
  • More graphically complicated files take longer to convert.
  • The higher the output resolution, the slower the conversion.
  • Additional processing for MSG and PST results in slower conversion.
  • There is a slight performance penalty for saving in G4 format.

The Windows Fast Dither uses a reduced memory area for conversion, and 'dithers' the text and graphics to B&W as they are being written to the surface. The Error Diffusion Dithering method dithers the whole image when it is being written to file (and can take up significantly more memory).

Default for the ODC Carrier is to set 'Windows Fast Dither' to on. Discovery Assistant currently sets Error Diffusion on, because Error Diffusion produces much better halftone (photo) output, and we have to assume that there will be a significant amount of halftone in the documents we are processing.

Q: One problem I have run into is that DA has difficulty with and often get an error message when I try to convert a .pdf file that is larger than 500 pages. What can I do?

A: The PDF problem I believe is related to us running out of spool file room. The PDF application when printing is putting data into the spooler much quicker than we can pull the data out.

Quick fix is to go into the printer properties / Advanced tab for the ImageMAKER XDC Service1, and set the spooler properties to 'print directly to printer'. This means that as soon as data goes into the spooler queue we take it back out. Acrobat is going to remain open much longer.

Only problems we are aware of with always having this setting on is that some Word documents with landscape/portrait pages don't print correctly. Also, some documents will print slower.

Q: How does your software handle error reporting, if at all?

A: When you import files to be converted, the product first identifies those files that can be converted from those that can't.

Then, during the conversion process if there are any failures, the failed files are listed in the 'failure' tab. Converted files are listed in the 'converted' tab. Files can be 're-converted' by moving them back into the 'queued' tab.

We've tested file lists of up to 250,000 files with no problems. If you need to convert larger numbers of files, then we suggest breaking them down into sets of 100,000.

Q: Does the software deduplicate emails? If so, based on what criteria?

A: We've built in automatic duplication removal. We create check-sums for each item (email, attachment, or file) that we then compare to all other items in the list. Before confirming a match, we do a complete byte compare between the two files. For emails, we check the 'text' contents against each other, rather than the whole msg file. (every MSG file is unique). The user then has the option of converting the duplicates (or not), and exporting the duplicates (or not).

Q: How does the software handle Excel spreadsheets? Does it remove blank pages automatically?

A: We've developed some specialized Excel spreadsheet software to do the following:

  • print all sheets, not just the currently activated one
  • print the defined print range. If no range defined, then print the entire sheet
  • >
  • User can limit the number of pages wide, and pages high so that large sheets fit into a pre-defined number of pages (Admin/Configure/Excel Options)
  • have just completed development of a 'blank page removal' tool. We can provide you with a stand-alone demo of this, and will be incorporating the feature into Discovery Assistant after we've done some more testing.
  • have prototype tools in-house to dump out the formula contents of a spreadsheet on a cell-by-cell basis. Understanding is that this exposes all 'text' in the spreadsheet for lexical analysis.

Q: What do I do to ensure there is white space for the bates stamp? I want to ensure that the Bates Stamp does not obscure the underlying data.

A: The solution is to do the conversion into a smaller area than what the original image size is.

The print driver has a setting that can modify the page margins. Default is 0 margins (output TIFF image size is same as input TIFF image size).

Go to Settings / Printers. Select properties for ImageMaker XDC Service1. Under the Device Settings tab, look for unprintable regions. Set the unprintable regions so that you have room for the bates stamp.

Suggest the following settings:

  • top: 0
  • right: .25
  • left: .25
  • bottom: .5

Q: What happens if one of the servers or clients crashes in the middle of a conversion? How does DA recover? How does the user manage events like this?

A: All conversions are controlled by the client, but can be handled on either a client or server.

In the event that the server machine dies, the client will time-out, then go on to try the next item in the list. In the event that their client machine dies, the user can re-start that machine, and it will pick up where it left off. All conversions status information is maintained in an XML file. This file is updated after every conversion.

For more information

ImageMAKER Development Inc.
416 Sixth Street, Suite 102
New Westminster, BC
Canada V3L 3B2
http://www.imgmaker.com
Copyright © 2004-2008
To contact us from overseas:

Sales: 1.604.525.2170
Local (Pacific) time: GMT-8
ImageMAKER Development Inc.

Sales: toll free (866) 525-2170
or (604) 525-2170
Support: (604) 525-2108
Fax: (604) 520-0029
Email: sales@imgmaker.com
support@imgmaker.com