📜 ⬆️ ⬇️

How to adequately test development platforms and do without holivars



Of course, the title of this post looks a bit rhetorical. Because:
1) There are no methods and assessment metrics that are recognized by all market players (which are in the automotive business, computer hardware or sports). This is the main problem of ensuring the reliability and adequacy of test results. But she successfully solved the industry as it grows up.

2) The problem of holivars is more complicated. CMS are platforms, and platforms (even purely technical) are the most fundamentalist, conservative and religious concepts of human culture. Because behind every technological or spiritual platform there are living people, its adherents. And people perceive the world a little differently, through the prism of the values ​​of their platform (Orthodoxy-Protestantism, liberal patriots, iOS-Android, procedural programming-OOP, spaghetti code in MVC templates, and so on).
')
Therefore, wherever a platform comparison attempt appears, a holivar will automatically begin. Many people simply refuse to accept arguments contrary to the dogmas, values ​​and ideas of their platform. No reasonable researcher today will risk publishing a comparative test of the Bible and the Koran based on the opinions of students of the Faculty of Theology.

Nevertheless, it is possible to compare CMS correctly, it's just that it is much more expensive and more difficult than giving a laboratory to several students. The question of reliability in methods, metrics, testing conditions, judges and recognition of the majority of market professionals. Moreover, in the case of adequate research methods, their holivarity is greatly reduced - even obstinate fanatics find it harder to argue with clear and proven facts.

After all that has been written here and here , I can not offer to the attention of the community my vision of methods of comparison and choice of development platforms, which was supported by many colleagues in the market.



Ratings, tests or studies



Ratings are the popularity of different products on the market. We already have two or three ratings with completely different results. They talk about the popularity of products, but not about their qualities (although there is a correlation between quality and popularity).

Tests are a comparison of consumer properties of products. There is no link to the popularity of products, although they always compare the most popular ones.

There are no CMS tests on the market. Each developer or customer of the site itself tests the CMS according to one clear method and chooses it based on its own test results.

Comprehensive studies of the impact of different CMS on the economics of web development or the economics of owning a website. This is what developers and their customers really need. They would carry the most useful information. And they, unfortunately, the most difficult to organize.

Whole or individual properties



Compare two or three of the best universal CMS on the basis of their technical qualities and it is almost impossible to choose the best one from them. Leading products therefore lead, because they compensate for their individual shortcomings with their individual merits and, in sum, the qualities are comparable with other leaders. And to compare leaders with outsiders is meaningless.

It is simpler and more realistic to either narrow the task by scope (the best CMS for a blog, for a store, for a fast site), or divided into tests of individual properties (performance, development speed, development speed, ..)

Sympathies or metrics



This is the key point of my scientific disputes with Mr. Ovchinnikov. I (and I am not the only one) believe that it is impossible to evaluate the technical CHARACTERISTICS of products based on the opinions of people about them. Otherwise, instead of comparing the CHARACTERISTICS of the products, we will get a comparison of the OPINIONS ABOUT THE CHARACTERISTICS of the products.
Opinions are studied in sociological surveys, but not in technical tests.

Take, for example, the evaluation of the performances of athletes (the same comparative tests in fact).
If one runner ran a hundred meters in 9.9 seconds, and another for 9.8, then the opinions of people (even the judges) do not affect the result.
In figure skating, metrics are less accurate; people give marks. But inaccuracy is reduced by a large number of judges, their membership in different countries, and, most importantly, their professionalism. No one polls the opinions and sympathies of pensioners and housewives to select a winning figure skater, although pensioners and housewives are the main consumers of the product of figure skaters. It is also incorrect to interview a student’s opinion when testing a CMS, arguing that students are CMS consumers.


I am not saying that it is easy to influence the opinions of people simply by selecting the right people and setting the tasks, the right conditions of the test. A testimony of devices with a clear test setting is more difficult to challenge.

In CMS, as in various sports, there are easily measurable properties, and there are quite abstract (for example, the installer or icon design).

How to evaluate the measurable quality of CMS? Like running.
Just measure them with instruments and metrics. Devices are eytreker, stopwatches and video cameras. Metrics - this is the time and other costs of the task, ceteris paribus.

How to evaluate the immeasurable qualities of a CMS? As in figure skating.
Interrogate the opinions of several recognized and independent experts. Where to get the experts is another question about this below.

Convenience is measurable



It seems convenient to us that which is familiar and uncomfortable to us is unusual. This greatly distorts the real concept of convenience. Pushkin's quill pen with ink, probably, seemed like a handy tool for working with text. And show him a laptop with Word - he would hardly have recognized its advantages. But if Pushkin had spent the time and effort, mastered the laptop, the pen would have ceased to seem to him such a convenient tool.

I am absolutely sure that convenience is not an abstraction. And the convenience of a CMS is not a relative, but an absolute (and, therefore, measurable) parameter, if we manage to leave behind the personal preferences and experiences of people.
Therefore, it is necessary to evaluate convenience not by polling opinions “is it convenient for you?”, But by means of measurements.

You can measure the convenience of a CMS by the time it takes to master the CMS (time spent on completing assignments when they first met), by the time and number of actions (clicks) to complete individual use cases. Research methodology and user cases should be developed by usabilityists.

In the 90s, when we did not yet know the words “usability”, “interfaces” and “CMS” in our university, they taught the subject “Scientific organization of labor”, where all these principles were formulated when designing the workplace of an accountant or hoisting console by crane.

Nothing fundamentally new in the methods of scientific evaluation of the convenience of interfaces has since appeared. Just appeared devices such as the Eye-tracker, which made it possible to evaluate convenience even more precisely than professional experts, not to mention students, do. By the way, I talked about this experience in the last User Experience. But if there are no aitrakers, you can get by with a stopwatch for measurements and a video camera to record the course of the experiments.

Methodology



An adequate test of CMS leaders is one whose methodology and experts are agreed upon by several market players.
How I see it:

1. The test should have 1-2 popular open source systems and 2-3 popular commercial systems.

2. The tested characteristics of the CMS are tested separately.
In each test, the influence of external factors on the parameter being tested is leveled. That is, the measurement of convenience or speed should not be affected by the quality of server operation settings, etc. All other factors should be equal and adequate to the requirements of each platform under test.

3. Target audiences are divided into separate groups:


4. Each tester performs some actions, testing on their basis gets a MEASURABLE result. All tests are organized so that the measurements cannot depend on the opinion of the tester or the subject. Judges (experts) perform a measurement analysis to summarize the results.

5. The opinions of the testers are taken into account only when evaluating abstract characteristics, for example, the interface style. Unmeasured criteria should be a little, no more than 20-30% of all tests.
It is advisable to involve experts in these areas to evaluate abstract characteristics.

The economic effect of using the platform.



Any technological achievement is useless if it does not have any economic effect. Any CMS quality is useless if it does not provide a performance boost in the development or operation of the site.

That is, all the improvements in the technical qualities of the CMS (performance, convenience, price, ...) are manifested either in the cost of owning the site or in the cost of developing the site. And only from an economic point of view, these technical qualities have value in the eyes of the business.

It is clear that the methodology for studying the economic effect of different technical characteristics is even more complicated than just technical characteristics. But it is realizable.

There are two main subjects of the study of the economic effect:
1. The total cost of ownership of the site , depending on the CMS (in the world uses the term TCO, Total Cost of Ownership), which takes into account:


2) The total cost of developing the site , depending on the selected CMS, which is:


Who are the judges?



Test results depend on who organized them. In order for the tests to be believed, they must be reputable and neutral people who do not want to prove anything to themselves or to any of those tested and who have no emotional connection with the test results.

This is the second important reason why a CMS test from a web studio running on this CMS will be incorrect, especially against the background of an opaque or inadequate testing methodology. As well as a test from any representative of the web development market (they all have established CMS preferences and business relationships with vendors). Testers will simply try to prove to themselves and the world the correctness of their choice and their preferences.

We have already seen how one leader of the affiliate program of one CMS independently tested its performance, and the leader of the affiliate program of another CMS made its independent CMS popularity rating. If we were a little cynical, one of the UMI partners would also conduct their independent research or make their own rating.





Independent researchers should also DEEPLY understand the methodology of conducting experiments, and web development in general, and CMS in particular.
Where are these people and companies? Do we have them? Maybe they have on Habré?
We are ready to fully support their professional work regardless of the results of their tests.

findings



Adequate tests will allow customers and developers to make a more informed choice of CMS (evaluating not only advertising promises and brand awareness), but CMS manufacturers to more clearly understand their strengths and weaknesses.

In the design of the article used paintings by artist Nikolai Kopeikin .

Source: https://habr.com/ru/post/108083/


All Articles