
Search on Habra did not find detailed articles on the Alfresco system. In this article I will try to kill two birds with one stone: tell us what the Alfresco system is and how we use it in our work.
How are documents stored in a small organization? The simplest is on a local disk. And if you need to work together - are sent by mail, or, the most popular option, on a network drive. Another great option is Google Docs, but I'm not sure that it is widely used in Russian practice.
I don’t know what size an organization should achieve in order to think about the implementation of an electronic document management system, but I think about this figure in the region of 50-100 employees working with documents.
')
When thinking about an electronic document management system, expensive solutions from well-known vendors such as Microsoft, EMC, 1C, etc. come to mind first. But there is an alternative to closed solutions - the
Alfresco open source document management system. Or, if in English, the Open Source Enterprise Content Management System (ECM, CMS).
Alfresco competitors are proprietary software, such as EMC Documentum, Open Text, Sharepoint. Alfresco developers themselves
write about their competitors as a legacy of the 90s, which:
- too expensive
- too difficult to use, deploy, scale
- too difficult to modify to fit your needs
- too “proprietary”
I'll try to talk about the system, and you already decide whether the developers were right.
What is Alfresco
Alfresco was originally conceived as an open source alternative to Microsoft Sharepoint. But in the course of development, he turned away from this, and provides a number of unique functions that are inaccessible to other similar systems. Suffice it to say that Alfresco works steadily over the Sharepoint protocol via HTTPS.
It is in the openness of the system that I see its main advantage: there is no “lock-in” for any manufacturer, the system itself is free. Another advantage of Alfresco I see is that it is built on modern Java technologies, such as Spring, JSF, Hibernate, Lucene; newer versions will use spring surf. And I know that big serious business loves systems in Java.
Users work with the system through a browser. It is also possible to work with files through Windows Explorer, as with a regular network folder (CIFS protocol) or via FTP. We work with the English version, there is a Russian localization.
Screenshot of Alfresco Document Management Standard PageAlfresco provides the ability to create, store, modify documents and more. It is possible to create a document directly in the system, both empty and based on the templates of your company. The system allows you to search through the contents of documents, supports document versioning. All history of changes is stored, it is always possible to look who that added or deleted.
There is a document management system, the ability to change the scheme of work on the go. Good article on the topic: “
Electronic document management or what not to do .”
Is it suitable for your tasks? Extensibility
Alfresco is completely ready to use, you can download the free Community Edition, install it, and start using it today, everything is very simple. There is also a paid Enterprise Edition, the main difference is the availability of technical support.
Alfresco is installed on both Windows and * nix compatible systems,
Java Runtime Environment is required. Delivery includes built-in OpenOffice, for converting between different types of documents, extracting text data for indexing, and full-text search capabilities. Also included is Tomcat, which, if desired, can be replaced with any
suitable web container .
Alfresco maintains its own user base. However, it is possible to auto-create users when they first log in or synchronize with an external source: LDAP, Microsoft Active Directory, company domain, etc.
Industry-accepted ECM standards are supported. Thus, the Alfresco data storage system smoothly shifts from its own implementation of the
JSR-170 standard to access to data via
CMIS , removing the last limitation - on the use of the storage supplied with Alfresco.
The system works with documents of any format: Microsoft Office, Open Office, pdf, etc. If the required format is not in the list of supported ones, you can add your conversion module to one of the supported ones, and conversion chains to all the necessary output formats will be built.
The advantage of Alfresco as an open system - full access to the source codes, you can change any part of the system, if you have good specialists, of course.
License allows.
The system allows you to extend its functionality with the help of extension modules. Modules can contain anything: business logic, page styles, new pages, data model extensions, and new services. Extension modules can work with Alfresco through a number of protocols; the
REST protocol is best supported. The user interface is proposed to be implemented using Spring Surf, the rest is no longer limited, most often Java is used, less often server-side JavaScript, Groovy, JRuby. The main thing is to support CMIS.
You can completely abandon the standard web interface and implement your own. Then Alfresco will only be used as storage.
For integration with other software, different types of authentication are supported, it is possible to connect them into chains. For example, a user can get into the system using
Single sign-on . If the user has come unauthorized, Alfresco will try to authorize it (asks for a username and password, or a certificate, depending on how the system is configured).
Alfresco has a very flexible data model, many possibilities for expanding it, but this is a topic for a separate article. In short, it is worth mentioning that the model supports multiple inheritance (using aspects), and dynamic, that is, at any time you can add an aspect to any object, and the object acquires all the properties of this aspect.
Data access and functionality can be flexibly configured. The authorization system operates with such concepts as: data object, resolution, user, group, role. Roles are assigned to users and groups while the application is running, including the ability to assign roles in cascade, to an entire subtree of data.
There are a large number of
ready-made extensions to Alfresco.
Number of users. Scalability
Due to the openness and gratuity of Alfresco, you are not limited to the number of client licenses. Rather, you are limited by the performance of your servers and database, the ability to scale the system.
Based on our experience, an Intel Core 2 2.4 GHz server with 8Gb of memory is enough to serve up to a thousand registered active users. As the number of users increases, it is necessary to analyze which parts of the system are the most loaded. The system works reliably in a cluster, ensuring the integrity and relevance of the data, but we need a competent setting, it will be described in more detail below.
There are examples of the introduction of Alfresco in a large non-profit organization in Russia with a base of 40,000 users and more. Examples of overseas implementations also include the use cases of Alfresco with hundreds of thousands of active users. Or with a much smaller number of users, but with multi-terabyte storage.
Our experience in implementing Alfresco
The system is used in the company - the largest software maker in Europe. Estimated number of internal users: 30 thousand. Expected number of external users: over 3 million.
Alfresco was chosen as the only ECM system on the market, with good enterprise support, Sharepoint protocol implementation, and implementation examples with 1000+ users. Microsoft Sharepoint was not, as far as I know, although perhaps it did not match other criteria.
Currently, about 2000 documents of 5-10 MB each are stored in the repository.
Major improvements made:
- Change the look of the system. Added caps, company logos where necessary.
- Alfresco is modified to work with the application server, database and authentication system, adopted as a standard within the company.
- Alfresco has been linked to existing metadata on the company's portal, such as country registries, customer categories, etc.
- A module for creating so-called “projects” by templates, creating documents by templates.
- Access control system. According to Alfresco, this is the only implementation with such deep use of the Alfresco access control system.
- Publication of documents passing through the stages of document circulation, on other resources of the company. Reverse import of documents into the system.
- Significantly changed the standard workflow in accordance with company standards.
- The ability to customize workflow on the go, using the user interface, including the distribution of notifications responsible for the work at each stage.
- Pairing with a third-party library for converting and extracting data from documents.
The system is already out of production. There are a number of problems encountered, some have not yet been resolved.
For example, when running on a local developer’s machine, the system works quite well. However, when running on a client in a cluster of 5 application servers, the system sometimes begins to unreasonably slow down. The problem has not yet been resolved, although even the Alfresco developers themselves have connected to it.
Unfortunately, the architecture of our system is built so that search engine indexes (Lucene) are stored on a network drive. And this is seriously contrary to the recommendations of the developers, often faced with the fact that the indexes fall.
Another problem with OpenOffice when converting and extracting data from documents. Even the latest version of OpenOffice in server mode can only convert one file at a time. An attempt to simultaneously convert multiple files leads to an unpredictable result. OpenOffice also has the unpleasant property of eating away a lot of memory over time and stop responding. I can recommend several ways:
- use JODConverter to start and automatically restart multiple OpenOffice servers at once;
- using other libraries for converting and extracting data (for example Aspose, however it is paid).
Developers recommend using MySQL / InnoDB as a metadata repository, but you can use other databases for which Hibernate / iBatis dialects exist.
There are also a number of recommendations that allow to increase productivity and increase reliability. Among the most important:
- as mentioned, do not use network drives to store Lucene indexes;
- using a file system with modern anti-fragmentation tools (EXT4).
At the moment, our project is still under active development. Despite some managerial and technical mistakes in the implementation of our project, I like the Alfresco system itself, it is pleasant to work with it, I believe in the prospects of open systems for business.
Conclusion
Alfresco is a good base for building a company's document flow. I think in the near future, Alfresco can be a replacement for many obsolete systems. Of course, there remain several unsolved problems, and the whole world of Alfresco is unlikely to capture, but I think that a substantial part of the market for the corporate storage of documents and workflow is quite capable.
It is possible to use Alfresco in the cloud. For example, in Amazon AWS, there are already ready-made instances with Alfresco pre-installed.
Walking the rumor that Oracle put the eye on the purchase of Alfresco. What it threatens or shines for Alfresco is still unknown, only time will tell.
It would be very interesting to see in the comments your introduction history of Alfresco.