OpenLSD ENOpenLSD

Download at SourceForge.net

ConceptLogic Technical Aspects Future Plan Howto Benchmarks
Home OpenLSD OpenLSM OpenR66

Concept and Logic OpenLSD ML

 

  • Concept and Logic of ML

  • Technical Aspects of ML

  • Howto enter in production with ML

  • Perspective of ML

     

    Multiple legacies is the ability for OpenLSD to maintain a mirror of the documents inserted in a Legacy across several (1 to n) servers that implement this Legacy.

    One can see a Legacy as a virtual storage. Each virtual storage can have 1 to n implementations on separate servers such that it increases the security of the documents archived in this Legacy.

     

    Considering huge amount of files does not allow making save on tape, so this kind of mirror should be considering as a real security for archive, and probably the only one.

     

    Considering also an organization that has huge network area, this kind of mirror can be used according to localization of each user. For instance, imagine one OpenLSD Server is in the US, another one in Europe, another one in Asia and a fourth one in Africa. Therefore users from Asia should access first to the Asia instance in order to allow fast answer without using intercontinental network links. All of these suppose that all mirror get the same document archived.

     

    Three ways can be used to have a mirror of the storages:

    • Using a physical mirror from the storage system; this option is easy to implement but could be expensive.

    Its advantage is that it simplifies the handling of the mirror since it is done at the physical level.

    Among its disadvantages, the price could be a problem since most of the time the solution will imply either a Fiber link between the two sites to maintain a good latency and bandwidth for the SAN network, or at least a good TCP/IP link with a real good latency and bandwidth. Another problem is that any bad action from administrator can have an immediate action on the copy (you know the Murphy’s Law, one wants to clean something because there is a problem and he cleans in fact a subdirectory containing files, and with the mirror, this delete is done immediately on the second site with no return).

    However it should be considered before to switch to another solution since it is quite simple and efficient.

    • Using an application mirror from the application level; this option needs more work on the application side but could be a good solution to know what is replicated and what is not.

    Its advantage is that it can handle specificity from the application point of view and it relies only on a simple network link.

    However, it implies to integrate the replication system inside the application logic, which could be difficult if not ready.

    You have to double check what’s happen when the network link is down or the second server is down and to know what the application needs to do in such a situation. Most of the time, the application will store some zip files (or equivalent) that includes the new files to be ready to be send to the second server when ready. Also you have to double check about the delete process on both sides.

    • Using the OpenLSD mirror from the OpenLSD level; this option is ready for production in OpenLSD but needs some attention to be sure of your needs.

    Its advantage is it does the job for you and it relies on a simple network link. It already takes care about deletion, network or server down status.

    You need to use the new ML interface instead of the standard one from OpenLSD. Also there are some production tips to know.

     

    The main idea is that each time an import or a delete is done, it is done on a “main” server (most of the time, it should be the one that is closest to the database) and then it stores in the database the actions to do on the others servers that implement the same Legacy.

     

    The replication is therefore asynchronous and starts after a successful action is done. For instance, the replication of one import is done after the first import is done and ok, the same for the deletion.

    The asynchronous scheme relies on the database persistence. It stores the actions that are still to do. For a specific document, when all relative actions are done and ok, the relative entries in the database are deleted.

     

    The ML support can be used even after the production was started without this option, and the reverse is also possible, so you can go from or to ML for one Legacy as you want. Although it should not be a good idea to go for instance from no ML to ML and then go back to no ML and again to ML support, since each time you go in the ML support you will have to synchronized the legacy servers.

     

    There is several kind of check in OpenLSD: files from database point of view, database from files point of view and all of them can be done on each component of one ML. There is also a specific function that enables to resynchronized if necessary one or more component of one ML. For instance this function can be used to start a ML instance after a production starts without ML support or to resynchronized to site where one had a problem (like storage failure).

     

    One can use also this ML support not only for security reason but also for efficiency since one can implement web services (even import) using the closest OpenLSD Server as a component for one ML.

     

    The database is unique since this is the kernel point of the OpenLSD implementation to ensure efficiency and security. However, one should take care about the replication of this database since it is not done using the ML support. The reason is that this database could imply business tables that are not related to OpenLSD, so the impossibility for this framework to take care of this replication. Considering very large network, the replication should be done using a master slaves plan, even if the master is changing from time to time (for instance considering the open hours across the world). Depending one the database software used, several options can be done. Also, an application replication schema can be used where the application takes care of the database replication by assembling SQL orders in one file and pushing it on other sites (that is the option we take).

     

    Once the database is replicated, the security is completed and accesses can be everywhere (at least in reading mode) to OpenLSD Servers and also to database schema.