Symfony CMF. Part 1, data storage

Instead of the preface

I’ve been programming on Yii for two years now and I’ve recently started to stare at Symfony Framework 2. Partly, I’m attracted to the well-thought-out architecture, partly the weak connectivity of components, and partly the flexibility of built applications. Immediately after I dealt with the main device of the new framework, I wondered if it was possible to build a CMS on it, and maybe even use the finished one.

I haven’t yet come up with a box solution, however, somehow I wandered onto the Symfony CMF project site and found myself completely overwhelmed by a methodical approach to solving the problems I encountered while working on a conveyor to tighten a design on some Drupal. On Habré there are no publications about CMF, and the project itself is still very raw, but in the long run everything looks interesting, although in some places there is something to complain about.

Symfony CMF

The Symfony CMF project is designed to simplify the development of functionality inherent in the CMS, for all those who use Symfony Framework 2 in their work.
The main features of the project:
')

weak connectivity
scalability
convenience
testability

It is necessary to focus on the word CMF - the project is not a CMS in itself, it is a framework . Unlike CMS, where all components are tightly tied to each other, in Symfony CMF you:

use whatever you want
replace what you don't like
ignore what is not required

That is, you are given a set of modular development tools, and not a ready-made application on a turnkey basis, although basic bundles have already been developed that provide CMS functionality.

Why another CMF?

It’s no secret that there are a lot of ready-made products on the market, both paid (1-Bitrix, UMI) and free (Drupal, MODx, Wordpress, Joomla). Therefore, it is quite logical that a question may arise when seeing the inscription Whatever CMS / CMF. Why even make another CMS at all? They are already so full.
And I absolutely agree. As a user.

CMS is really a dime a dozen. But as a developer, I often shed sweat, blood and tears, trying to get something more out of them, something the authors of the underlying system and third-party extensions did not.

Due to insufficiently thought-out architecture, when working with ready-made solutions, one has to deal with:

the lack of a clear separation of logic, configuration, content and presentation. Suffice it to recall the modules Drupal - a bunch of files, a jumble of unclear how named global functions, hooks and other things. That's the way a good article in which this issue is discussed
a lot of legacy code left over from old versions. Periodically, developers are trying to fix it, promising to rewrite the kernel and other joys, but until the new (rewritten) version reaches the “can be used” stage, a lot of time can pass
often there are no such concepts as development, testing, and there are no deployment tools
caching problems Somewhere it is not, somewhere it is, but it does not give a sufficient degree of flexibility, or there are problems with disability, or it simply does not save, etc.
poor performance on large (in this case, this concept is relative) data volumes
difficulty in creating your own components or overriding existing ones
you have to choose between pre-defined data types, or be content with EAV-storage on top of relational DBMS, or worse
uncomfortable template engines-bicycles, invented by the authors of CMS ...
... and all this as a consequence of the NIH syndrome.

The developers of these systems are aware of flaws and do not reject the charges, but at the moment it is impossible to solve all these problems. However, we will not swear at everyone, we will better formulate a number of problems that the CMS should solve for the sake of user convenience, and then we will see how these problems are solved in Symfony CMF. So the problems are:

data storage
template system
routing, CNC, and how the user can control all this
menu setting
content management (editable blocks on the page, front-end editing on a live site, uploading files)
i18n
good admin

Let's start in order from the problems.

Data storage problem

Based on the actual interpretation of the concept of CMS, it becomes clear that the most important component of the CMS is data storage. Even more - the CMS should provide data storage with different properties . For example, for materials like BlogPost or NewsItem, you can create common fields title and body , and then the differences will follow - you may need to attach pictures to the news.

Imagine an online store. What is stored in the database? At least - descriptions of goods and order history. Unlike the first, for the second it is much easier to design a storage scheme, although it is obvious that both friends cannot exist without friends. Hence the following requirement: the CMS should be able to refer to the content both within the CMS and in other parts of the system .

The content on the site itself is most often organized in the form of a tree structure, in some way repeating the file system. At the same time, the authors of the site want to organize content in different ways depending on their needs, as well as flexibly adjust the menus and addresses of materials. Thus, the CMS must present the data in a tree structure and be able to maintain several independent trees at the same time .

The information that users enter into the CMS is rarely perfectly structured. Sometimes, you need to add one, another, third, tenth field - CMS should not be forced to use a single scheme for content or, even better, give the opportunity to define your own scheme .

In large organizations, it is not uncommon for the material to go through several stages of verification before appearing on the site, instead of publishing with one click - the CMS should support moving and exporting content between trees . And for the story it would be nice to keep versions of the content that can be restored at any time.

It should be remembered about users from other countries and regions. Although the whole site is not usually required to translate into another language, the CMS should provide an opportunity to present content in different languages, with an optional fallback according to the rules specified .

When the content becomes too much, you will definitely need full-text search , the ability to determine the rules for controlling access to subtrees , and assistance in the process of publishing a document by several authors (workflow is different for everyone).

Content Repository

It becomes clear that one "muscle" will not get off. Relational databases with such tasks simply can not cope, although there are algorithms such as Materialized path or Nested set, which allow you to store the structure of a graph in flat databases. But even if a single implementation will work, it will most likely be rigidly tied to a specific engine, and this is already bad, because it deprives us of freedom and flexibility. No need to blame the RDBMS - they are conceived for completely different tasks, they need clearly described data, and not trees consisting of weakly structured elements.

However, we will not be upset - after all, we have invented content repositories or content repositories long ago, if we translate bourgeois. Repositories are designed to give access to reading, writing and searching data, regardless of the applications that need this data. In essence, this is a data warehouse with an emphasis primarily on the logical aspect of data processing.

JCR-170

The problem of data storage for document-oriented systems arose many years ago, so even in the first half of the two thousandth, people from Day Software (namely David Nüscheler) submitted a request through the Java Community Process to accept the Content Repository API for Java (JCR) specification, which The serial number was assigned to 170. Later, the specification was held under the number JSR-283 (2.0), JSR-333 (2.1, the final draft was completed on August 31), but the link to the first version is still more common.

According to the specification, the repository is an object database that provides storage, search and retrieval of hierarchical data. In addition, the provided API allows you to use data versioning, transactions, change tracking, import / export to XML, and also store binary and metadata.

Such a repository is organized as a tree of nodes that have properties. Directly the data is stored in them, and it can be numbers, and strings, and binary data of arbitrary length. Nodes can be subdivided into types, have child nodes, certain behavioral characteristics, or simply refer to neighbors (using a special property and a unique identifier that each node has).

Starting with the second version of the specification, the repository should be able to respond to SQL queries, which is more convenient than their XPath counterparts from the first edition.

As a vivid example of the implementation of such happiness, you can highlight the Apache Jackrabbit project, an open-source repository written in Java. In addition to all the goodies described above, this project (started back in 2004 as the initial implementation of the JCR API) is able to flexibly control access to the content. There is also clustering, locking mechanisms, etc., but this is not very interesting for us now, so we’ll skip it.

PHPCR

But not everyone writes in Java! (Omit the jokes on this topic)
For people like us, the Content Repository for PHP was created - the JCR API described above, adapted to the style of PHP. Assuming that the API is the same and well specified, it follows: you can write the application once, and then just change the backends (theoretically, of course).
An important plus is that we do not reinvent the wheel (as we remember, the problem of data storage in the CMS has already been solved).
Of course, such an initiative could not be ignored - David sent a request for the adoption of PHPCR in JCR 2.1. Very cute.

Since you cannot just take and port APIs from Java to PHP, there are still differences between implementations. In short, this is due to the fact that PHP is weakly typed and does not support method overloading. Therefore, some of the interfaces and functions were simply thrown away as unnecessary, and where there was an overload, methods were simply added optional arguments. Details of the differences are described here , but nothing terrible is not there.

Currently PHPCR supports the following functions:

tree access
access to nodes by UUID
search by nodes
versioning
identifying opportunities
Import and export to XML
Locks
Transactions *
Permissions
Access control*
Change tracking

(*) - Not yet implemented in Jackalope-Jackrabbit (more on this below), although the information could be a little outdated.

Key PHPCR concepts :

all content is stored in the node tree
nodes have name and type
nodes have child nodes and value-storing properties
property values can store numbers, strings, binary objects, and references to other nodes.

Somewhere we have already heard, is not it?

Let's see what this repository might look like (schematically, of course):

 <root> <cms> <pages> <home title="Hello"> <block title="News" content="Today: PHPCR presentation"></block> </home> <contact title="Contact" content="phpcr-users@groups.google.com"></contact> </pages> </cms> </root>

So far, nothing supernatural.
Consider a little more detail what you have to work with.

Knots

node is a named container that always has a parent
resembles XML elements
nodes can be created, deleted, modified, copied
the path to the node consists of the path of the parent node and the name of the current node:
Path: / cms / pages / home
Parent path: / cms / pages
Host Name: home

Node properties

nodes have named properties that store values
resemble XML attributes
data types: STRING, URI, BOOLEAN, LONG, DOUBLE, DECIMAL, BINARY, DATE, NAME, PATH, WEAKREFERENCE, REFERENCE
types (WEAK) REFERENCE create links to other nodes
nodes and properties can have namespaces: jcr:created , jcr:mimeType , phpcr:class

Basic node types

define the names allowed for use, as well as the types of properties and child nodes
each node must have a main type installed
for storage, anything is used nt:unstructured
among other built-in types are nt: address, nt: folder, nt: file and others
You can define new types of nodes to create your own scheme

Mixin node types

main types do not have multiple inheritance
but there are mixin types that add trait- like functionality to nodes
mixin types can be assigned to a node during its lifetime

Example: let's say we have a jcr:uuid property that stores a unique identifier. Knowing uuid, we can create a mixin mix:referenceable , and based on it mix:versionable (but then we still need to have the properties jcr:versionHistory , jcr:predecessors , jcr:baseVersion , jcr:isCheckedOut , jcr:mergeFailed , etc. )

Workspaces

there can be several workspaces, each one keeps its own node tree
resembles the Unix file system and branches in Git / SVN, each can be cloned and merged
can be used independently

And now some examples of how to work with all this:

Session creation

 use PHPCR\SimpleCredentials; // ,      use Jackalope\RepositoryFactoryJackrabbit as Factory; $parameters = array( 'jackalope.jackrabbit_uri' => 'http://localhost:8080/server', ); $repository = Factory::getRepository($parameters); //         $creds = new SimpleCredentials('admin','admin'); $session = $repository->login($creds, 'default');

CRUD operations

 $root = $session->getRootNode(); //        $node = $root->addNode('test', 'nt:unstructured'); //        $node = $session->getNode('/test'); // /  $node->setProperty('prop', 'value'); //       $session->save(); //       $node->remove(); //  ,   -     $session->save();

Tree traversal

 $node = $session->getNode('/site/content'); foreach ($node->getNodes() as $child) { var_dump($child->getName()); } //   foreach ($node as $child) { var_dump($child->getName()); } //    foreach ($node->getNodes('di*') as $child) { var_dump($child->getName()); }

Versionality

 //   $node = $session->getNode('/site/content/about'); $node->addMixin('mix:versionable'); $session->save(); //    $node->setProperty('title', 'About'); $session->save(); // - ( ) //  - (   ) //         $session->save() $vm = $session->getWorkspace()->getVersionManager(); $vm->checkpoint($node->getPath()); //   $node->setProperty('title', 'Ups'); $session->save(); //    ,    «  » $vm->checkin($node->getPath()); $base = $vm->getBaseVersion($node->getPath()); $current = $base->getLinearPredecessor(); $previous = $current->getLinearPredecessor(); //     $frozenNode = $previous->getFrozenNode(); echo $frozenNode->getProperty('title'); // About //       $vm->restore(true, $previous); $node = $session->getNode('/site/content/about'); echo $node->getProperty('title'); // About

 $qm = $workspace->getQueryManager(); //  SQL2   "*"     //         // (. http://docs.jboss.org/exojcr/1.12.13-GA/developer/en-US/html/ch-jcr-query-usecases.html#d0e3332) $sql = "SELECT * FROM [nt:unstructured] WHERE [nt:unstructured].type = 'nav' AND ISDESCENDANTNODE('/some/path') ORDER BY score, [nt:unstructured].title"; $query = $qm->createQuery($sql, 'JCR-SQL2'); $query->setLimit($limit); $query->setOffset($offset); $queryResult = $query->execute(); foreach ($queryResult->getNodes() as $node) { var_dump($node->getPath()); }

Other code examples can be viewed in this presentation .

However, let us return to the alluring thought about different backends.

We currently have not so many implementations, but also those already interesting:

Midgard2 PHPCR
Jackalope
supports jackrabbit
supports Doctrine DBAL (data storage on relational databases)
supports MongoDB (actually not)

Midgard2 PHPCR

Midgard2 is an open source content repository with binders for C, Python and PHP .

A little different terminology from JCR, Midgard2 provides the same functions for accessing content via Midgard2 PHPCR using the php5-midgard2 extension . Being built on top of the GNOME libgda library, Midgard2 maintains an impressive list of relational databases in which you can place your repository.

Immediately I will say about a fly in the ointment - a PHP extension is compiled for a sufficiently small number of OS:

under Debian 7 Wheezy, the package is still in unstable branches (and deservedly silently drops PHP-FPM in a segfolt).
for CentOS, the packages are either outdated or not for all architectures (but where there is, it is very likely that it works, the hands did not reach)
Windows builds do not exist in nature (it is possible that the “gnome” roots of Midgard2 itself are affected, although four years old files are still in PHP4 in the repository)
I could not test it under Mac OS due to my lack of it (but judging by the site, everything is put through brew).

In general, everything was successfully installed on Ubuntu Server 12.04, there are fresh packages and nothing crashes.

However, from communicating with the Symfony CMF developers on IRC, it became clear to me that this backend provider had been broken for several months, even tests for it were disabled. The reason is somewhere on the side of the Midgard2 team, although bergie promised to fix it.

IRC

Midgard2 PHPCR as part of the symfony CMF did not work for me. Maybe someone else will. Not now, then later.

Jackalope

Continuing to beat the hare topic in the names ( Jackrabbit , Jackalope ), Jackalope provides access to three types of data warehouses:

this is already known to us apache jackrabbit
Doctrine Database Abstraction Layer, which allows you to use supported DBAL engines. This is in theory, in practice, only MySQL, PostreSQL and SQLite have been tested (someone is using something else?).
MongoDB (not updated for two years, most likely broken or irrelevant)

Jackalope (and in particular jackalope-jackrabbit) is fairly stable and is recommended for use as the most complete implementation of the PHPCR API in terms of features. We will work with her. However, phpcr-api-tests that check the availability and performance of the PHPCR API are also included for jackalope-doctine-dbal, which may eventually catch up.

PHPCR Summary

So, we have an (adapted for PHP) API for accessing content repositories that conform to the JCR API standard. For this API, several libraries have been developed that abstract the application code from the data store.

So far, there should be two main questions, and both will be answered:

When to use PHPCR?

When to work with hierarchical navigation structures
When you have data related to each other
When data versioning is needed

When to NOT use PHPCR?

For strictly structured content and the use of aggregate queries, it is recommended to use relational databases. For example, in an online store, product descriptions can be stored in PHPCR, and orders can be stored in RDBMS.

PHPCR ODM

The specification is great, but the API is too abstract and inconvenient for everyday use (after all, most are accustomed to some ORM system). And here comes the PHPCR ODM project, which is a bundle of PHPCR and Object Document Mapper.

A Doctrine ORM , familiar to developers using SF2 (and not only SF2), implements the Data mapper pattern to access data stored in RDMBS.

ODM, like Doctrine ORM, uses Data mapper to completely separate business logic from the data storage layer, which in this case is the content repository. The authors honestly admit that ODM is inspired by the ideas of Hibernate .

ODM stores objects as PHPCR nodes, calling them documents. At the same time, since PHPCR is already independent of implementations, it does not require writing a new abstraction layer from the database (DBAL).

What is a document in PHPCR ODM terminology?

The document is a concise PHP class that does not implement any interfaces (or rather, you can always implement it, but the library itself does not require this) and is not inherited from some basic abstract classes. Such an entity should not have methods with the keyword final , implement the clone() and wakeup() methods, or implement them, but doing so very carefully . By itself, an entity consists of properties fixed in a repository. Since ODM works on top of the Doctrine Common library, which implements the basic functionality (annotations, caching and autoloading of classes), mapping of properties in the data store to class properties is done using the familiar way — through annotations in PHP comments or in YAML / XML configurations. Each document has a title (title) and content (content). All documents are organized as a tree and can refer to other documents. Take a look at the sample document:

 namespace Demo; use Doctrine\ODM\PHPCR\Mapping\Annotations as PHPCRODM; /** * @PHPCRODM\Document */ class MyDocument { /** * @PHPCRODM\Id */ private $id; /** * @PHPCRODM\ParentDocument */ private $parent; /** * @PHPCRODM\Nodename */ private $name; /** * @PHPCRODM\Children */ private $children; /** * @PHPCRODM\String */ private $title; /** * @PHPCRODM\String */ private $content; //            }

Note that in addition to the usual data types (for example, String), annotations can also specify the type of references to child or parent documents.

For those unfamiliar with the Data mapper pattern, it may seem that such classes are a bit similar to Active record (hello, rails and Yii-shniki), but they are not anyway.

How to work with such a document?

 require_once '../bootstrap.php'; //     $rootDocument = $documentManager->find(null, '/'); //    $doc = new \Demo\Document(); $doc->setParent($rootDocument); $doc->setName('doc'); $doc->setTitle('My first document'); $doc->setContent('The document content'); //  ,    $childDocument = new \Demo\Document(); $childDocument->setParent($doc); $childDocument->setName('child'); $childDocument->setTitle('My child document'); $childDocument->setContent('The child document content'); //     ,        $documentManager->persist($doc); $documentManager->persist($childDocument); //   ,   ..   $documentManager->flush();

 require_once '../bootstrap.php'; $doc = $documentManager->find(null, "/doc"); echo 'Found '.$doc->getId() ."\n"; echo 'Title: '.$doc->getTitle()."\n"; echo 'Content: '.$doc->getContent()."\n"; foreach($doc->getChildren() as $child) { if ($child instanceof \Demo\Document) { echo 'Has child '.$child->getId() . "\n"; } else { echo 'Unexpected child '.get_class($child)."\n"; } } //   $documentManager->remove($doc); $documentManager->flush();

A small note is that in ORM it is usual to receive data using queries. In ODM, you need to use a hierarchy for this. However, you can do queries if you really want to.

PHPCR ODM has already implemented two very important functions - versioning and multilingualism. Let's start with the first.

Versioning in PHPCR is of two kinds - simpleVersionable and versionable. For simple versioning, checkin / checkout methods and a linear history are provided. Chekin creates a new version of the node and makes read-only available. To write something down, you need to make a checkout.

( - PHPCR-ODM ) ( , Jackalope). , ( , , ).

mix:versionable PHPCR . , PHPCR Version API PHPCR ODM , PHPCR\VersionManager PHPCR-. .

PHPCR . - ( ). , .

( — restoreVersion() removeVersion() .

- , :

 /** * @Document(versionable="simple") */ class MyPersistentClass { /** @VersionName */ private $versionName; /** @VersionCreated */ private $versionCreated; }

, , , Phpdoc- . , .

 $article = new Article(); $article->id = '/test'; $article->topic = 'Test'; $dm->persist($article); $dm->flush(); //         $dm->checkpoint($article); $article->topic = 'Newvalue'; $dm->flush(); //     $versioninfos = $dm->getAllLinearVersions($article); $firstVersion = reset($versioninfos); //        $oldVersion = $dm->findVersionByName(null, $article->id, $firstVersion['name']); echo $oldVersion->topic; // "Test" //    $article = $dm->find('/test'); echo $article->topic; // "Newvalue" //       $dm->restoreVersion($oldVersion); //   echo $article->topic; // "Test" //    ,    $article->topic = 'Newvalue'; $dm->flush(); $dm->checkpoint($article); //      (     ) $dm->removeVersion($oldVersion);

. . , , , , . — DocumentManager, , find() . :

 /** * @PHPCRODM\Document(translator="attribute") */ class MyPersistentClass { /** *    * @Locale */ private $locale; /** *   * @Date */ private $publishDate; /** *   * @String(translated=true) */ private $topic; /** *     * @Binary(translated=true) */ private $image; }

 //   DocumentManager (   ) $localePrefs = array( 'en' => array('fr'), 'fr' => array('en'), ); $dm = new \Doctrine\ODM\PHPCR\DocumentManager($session, $config); $dm->setLocaleChooserStrategy(new LocaleChooser($localePrefs, 'en')); //   : $doc = new Article(); $doc->id = '/my_test_node'; $doc->author = 'John Doe'; $doc->topic = 'An interesting subject'; $doc->text = 'Lorem ipsum...'; //     $dm->persist($doc); $dm->bindTranslation($doc, 'en'); //          $doc->topic = 'Un sujet intéressant'; $dm->bindTranslation($doc, 'fr'); //    echo $doc->locale; // fr //    PHPCR $dm->flush(); //       // (   ) $doc = $dm->find(null, '/my_test_node'); //     $doc = $dm->findTranslation(null, '/my_test_node', 'fr'); $doc->title = 'nouveau'; $dm->flush(); //    ,

, , , . , , ( , , ). , , .

, , ( ), Solr/ElasticSearch Doctrine DBAL MongoDB. Jackrabbit ( Oak) , - PHPCR .

Summarize. ODM :

PHP Content Repository Jackalope Midgard2 ( Jackrabbit )
PHPCR-ODM Doctrine Common
.

Source: https://habr.com/ru/post/197524/

All Articles