ConceptLogic

The main logic of Open Legacy Storage Mail is to enable the mail (email) archiving on an Open Legacy Storage Document Server and to be able to retrieve those emails through a web query interface. OpenLSM is a mature framework for the mail archiving process. It can be easily adapted to the needs and standards of your business unit, your enterprise or your specific organization.

Two processes can be implemented to archive mails in a usual approach.

One could be based on mail server, so archiving almost all mails based on some rules. This approach, convenient for most users, is not convenient from an infrastructure point of view and from a privacy protection point of view. Indeed, with such an approach, it is difficult to identify which mail should be archived or not, especially considering spasm and personal email. Thus this approach implies that a huge amount of mails could be automatically archived, even spasm and personal emails. Also, considering a thread of messages, probably only the last one could be interesting to archive. So from an infrastructure point of view, archiving from mail server implies a huge amount of emails storage without any responsibility of the user on the amount of those emails. And from a privacy protection point of view, personal emails could be archiving so contrary to the legal approach. However, if one wants to use OpenLSM to archive emails from mail server, it is still possible using one of the following options:
1. Create a SMTP server that will receive email from the mail server and running the import tasks once each file is received
2. Create a bridge between your mail server that will save each mail and running the import tasks from this file

However, you will have to take care of analysis, rules (exclusion, doubles, …) and indexation of these emails before the import process. Once done, the import process will be the same as with the preferred following approach.

Another approach could be based on the client side, considering that the user is responsible to decide which email should be archived. This approach enables to filter emails with each user, so the user should not archived spasm and personal emails, and also enables the user to archive one mail among many from one thread. Another interest of this approach is that the user can specify keywords when he/she decided to archive one email, so facilitating the query process after archiving. Also, this approach enables to implement quota of mail archiving by user so giving a limitation if necessary in the storage infrastructure.

Nevertheless, there are two options then with this approach. If two users decide to archive the same email (they both receive this email), one option could be to archive twice this email (using two different owners), or the other option could be to archive only once this email and giving access to this email to both users. For now, the simple version of OpenLSM archives twice this email, because it could be difficult to check precisely the equality of two emails.

Also right now, the mail is archived using the following data as index information:

the sender
the recipients (to and cc)
the user identity who asks to archive this email
the mail account that receives this email (considering business address email not attach to one specific person gives the necessity to allow to store from which email account this email was receipt).
the subject
the date of this email
extra keywords given by the user during the archive process

No analyze or transformation (eml to pdf for instance) are made on email but could be done.

The web query interface is not fully functional and is only intend to show how to create the interface for OpenLSM. For instance, no Single Sign On is made, no check on identity is made, no cross identity is made (considering business generic email accounts) but those can be done easily using internal process according to your needs. Those points are out of the scope of this framework.

When the user select emails and click on the button in its email client, a window ask the user to give some specific keywords on each email. Then, when done, the email and one xml file containing indexes are sent to the OpenLSM Server. This sending can be done in three ways:

If the organization is small, an OpenLSM client can be used but it needs to be able to access to the Database in write mode (to be able to include those indexes). I would not recommend of course this option.
The mail and the index can be sent to the OpenLSM server using a specific protocol (standard or not), for instance FTP. This approach can be easily implemented but requires some extra attention on the server side to pool the files that are received to be able to run the import of those files. This could be done using the auto-import function from OpenLSD on which OpenLSM is build.
The mail and index can be sent to the OpenLSM server using the OpenR66 protocol using light client (no database, no logs) and therefore enabling a business process once both files are received (extra actions like eml to pdf transformation or email analysis, …) before the final import in the OpenLSM server. This approach enables to not pool the directory where the files are received.

Right now, only an extension for Thunderbird (1.5 and 2.0) is done for the email client button since the project needs to be Open Source and also because I don’t have access to the specific tools for other email client like Outlook or Outlook Express. However this could be done easily based on some experience from other users.