📜 ⬆️ ⬇️

Domestic server manufacturer = self-assembly?

For a long time, there has been a “holivar” on how to call Russian companies that assemble computer and server equipment - manufacturers or “self-assemblers”? Some believe that if a server or PC is assembled at a conveyor or stacker production, has been tested and has a guarantee for the finished product, and not for individual components from the component manufacturer, then such a company can be safely called a manufacturer. Others, on the contrary, consider manufacturers only those companies that solder the boards, and all the rest are just self-assembly.
If you delve into the issue and consider examples of world brands, then this line will not be as obvious as it seems at first glance. Few global manufacturers of server hardware independently perform the full production cycle. Basically, this is limited to the development of design and / or the use of resources of ODM manufacturers - Foxconn, Quanta, Mitac, Chenbro, Supermicro, and others.
At the moment, oligopolies are firmly entrenched in almost all areas of the production of computer and server components. It is senseless and economically unprofitable for server hardware manufacturers to reinvent the “bicycle again”.


Consider the examples of the contents of the servers of some recognized brands. What we see: hull Huawei - or is it Supermicro? Lenovo / IBM RAID controller - or maybe LSI / Avago? HP Disks - or Seagate? And the memory is not HP, but Hynix or Samsung.


HP Seagate Part Number Drive
')

HP Drive with Hitachi Part Number


This is how the memory of many server manufacturers looks like: original Hynix, Kingston, Samsung stickers

Although motherboards are made according to a unique design, they are soldered in the production of all the same Foxconn, MiTAC and other ODM.
It is clear that the changes made to the original products are significant - it can be the development of a unique design, and writing your own BIOS. But sometimes - it is banal glueing stickers, with the appropriate software and hardware validation.
There is also the question of the actual assembly - many import brands are shipped to Russia in a disassembled form (or are dealt with by distributors). As a result, the assembly is made anyway here, here, in Russia - by distributors, integrators or end users. Despite this, some servers are sold as finished products manufactured at the original assembly plants, however, such systems almost always have a long delivery time, from 8-12 weeks, and even more.

The main features of the branded server are:
1. Ensuring maximum compatibility and stability of all system components - in part this is achieved using optimized firmware and drivers.
2. The range of services that the consumer receives with the server, and which accompanies it during operation. This complex includes: configuration optimization, technical presale, assembly and testing, technical advice and warranty service.

What is the domestic manufacturer of server hardware?


Recently, the assembly of servers involved all and sundry. Even some distributors of components, disrupting the structure of interaction with partners, have already stopped being embarrassed to sell directly to their end equipment their own assembly.
Assembly now will surprise no one. Gather everywhere: from radio markets and online stores, to resellers and integrators.
But there are specialized companies. Today, there are several such manufacturers and integrators operating in the domestic market, producing server equipment under their own brand.
STSS is one of those who produce their own product, and in our example I want to demonstrate how we differ from most companies that offer, along with the sale of components, assembly services.
To do this, I am going to describe the functional divisions of the company, the presence and level of competence in itself distinguishes the responsible manufacturer from the self-assembler.

Laboratory


The tasks of the laboratory include the preparation of a new server model for mass production. When it comes to ready-made platforms (Intel, Supermicro , Asus, Tyan), there are fewer difficulties, because The main components, such as the case with backplays and disk baskets, power supply units, motherboard and cooling fans, are already included in the platform and are compatible with each other. But if it is necessary to create a typical model with an extensive configurator, complete testing and preparation turn into a long and painstaking process. This includes a number of key activities in which engineers encounter problems, many of which never arise during normal operation:

1. Check the mechanical compatibility of components. Identify incompatibility in overall dimensions or cable lengths.
At this stage, such problems sometimes arise, for example:
- In the case there is no proper place to install and securely fasten the battery of the RAID controller (the option on electrical tape is not considered anywhere)
- Constructive rear panel of the case does not allow you to screw the video cable connector (a trifle, but it happens)
- The video card overlaps the slots, connectors, or rests against the components of the motherboard
- Double power supply wiring rests on a hot-swappable basket fan
- Insufficient length of power cables, and the length may actually be enough, but with proper cable laying, tension may occur. According to the assembly technology, it is not recommended to run the cable in the way of the air flow, thus reducing the cooling efficiency, but it is also impossible to allow the tension and break in the connector. With this restriction, the engineer selects a suitable power supply unit, or creates technical requirements for the length of cables for ordering a batch from the manufacturer of power supplies.

2. Verify software compatibility . Interaction with the manufacturer of components, preparation of firmware, compatibility testing at the software and hardware level.
Examples of problems at this stage:
For example, Matrox Mura MPX cards did not start on the Supermicro X9 series early BIOS. There were cases of conflicts between Intel's SSD and Intel's RAID controller, and once the Asus video card refused to start on the Asus motherboard, and separately they worked perfectly. Incompatibility between BIOS and driver versions is one of the most common problems. Sometimes you have to select a stable working configuration by rolling back and updating the firmware, drivers, changing settings several times, or asking the manufacturer to fix the problem and provide a finished BIOS or driver version.

3. Verify electrical compatibility . At this stage, the ability of all power supply lines to provide the necessary power of all connected devices at maximum load is revealed.
An example of a limit on the lines:
There were cases when one of the 12V-lines could not pull out the first launch of the disks in the hot-swap basket connected to the integrated SATA controller. Discrete controllers are able to start disks with a delay to ensure a smooth increase in load, and the integrated one starts all at once. As a result, the line fell, while the total load on the power supply did not exceed 40%.

Electromagnetic Incompatibility Example
In the routing of one of the assembled items, the power and SATA signal cables were allowed to be laid in one bundle. However, at the next instance, load testing showed unstable operation of a specific hard disk model. The problem manifested itself extremely rarely and only under a long 100% load on the disk subsystem. In the process of troubleshooting, it was found that the separation of power cables and SATA leads to the disappearance of errors. Again, the problem manifested itself only with many hours of 100% load on the disk subsystem, which even in the server happens infrequently.

4. Check thermopacket . Overheating of the discrete controller and video card is a common problem when testing thermal mode, especially in compact cases. Testing is carried out at maximum load in the test zone with a temperature of 35 degrees Celsius. Overheating a component at least 1 degree above the maximum allowable value puts an end to the launch of a mass production. In reality, the problem is even a configuration in which the temperature regimes approach the maximum allowable values ​​of less than 10 degrees. And this is despite the fact that the customer will probably never be able to create such conditions in his server room when performing real tasks. Problems with cooling often arise both due to insufficient and unbalanced blowing.
Example:
There was a case when a processor with a powerful active cooler overheated in a 2U case only because the case cooling system did not have enough traction for blowing out. The processor cooler threshed the hot air inside the case, and just one small 40mm fan on the back of the server lowered the processor temperature by 15 degrees.
The laboratory's functionality includes not only fixing obvious incompatibilities, but also identifying floating problems with the search for solutions. Such a complex of checks is very resource-intensive and labor-intensive, but it is an obligatory stage, which allows to get a very high level of stability of the final product at the output. The result of the preparation of the model for serial launch is the transfer to the production of all the necessary firmware, drivers and a full step-by-step technological map of the assembly and server settings.
All found features of compatibility or inoperability are entered into a special knowledge base containing a description of the problem, possible solutions, know-how, and other important information. This base is the result of constant long-term work of a team of engineers, and is the intellectual property of the company.
An example of sample entries from the laboratory engineer’s workbook (for obvious reasons, I cannot reveal the text completely):






Input control


This internal procedure for checking incoming components to the warehouse, eliminates the possibility of rejecting production. As a result, we manage to sustain the actual production time of the server at the level of 3-5 days for standard models, with a stated deadline of 7-10 days. Tests are simplified, not stressful, and screen out components with an obvious defect.

Production


Assembling server hardware and storage systems is done on the stocks by experienced engineers in strict accordance with the technological map. Installation and firmware components, initialization of RAID-arrays. OS installation and driver configuration is performed in a technological network in an automated mode.

Production of the server on the basis of the domestic server platform E-Class from the company "T-Platforms" (x5 write speed)


Production server based on Supermicro components (x10 write speed)


Before sending the server for tests, the quality control engineer evaluates the product according to the following criteria:

1. Compliance with the configuration.
2. The presence of external damage.
3. The quality of mounting components.
4. Quality of laying and fixing cables.

If the product meets all the norms and standards of the company's assembly, it is sent to the test zone.

Stress Testing


On the test bench, the server is subjected to a long load on all subsystems. For this purpose, a specially developed technique that ensures maximum utilization of server hardware resources using a software package. This allows you to identify malfunction or incompatibility of equipment. The technique is as follows: A script is launched that polls the system and determines the equipment composition and driver versions. Depending on the configuration, the script runs successively many tests and their combinations.
These include dozens of consecutive soft reset programs, specialized graphics subsystem tests such as SPECviewperf, 3DMark, specialized processor load tests from manufacturers, which, unlike the popular BurnInTest, load absolutely all processor blocks, outputting it to the calculated TDP, their own specialized tests disk subsystem, imitating all types of loads (linear, random, mixed) and other load functionality that loads and checks the performance of processors and memory and, network and disk controllers, all drives, optical drives, graphics coprocessors and other additional devices.
Standard testing lasts from 18 to 30 hours depending on the configuration. This method of testing the finished product allows us to almost completely eliminate the release of unstable equipment.

Warranty service


Due to the load tests and output control, almost 100% of all warranty cases amount to failure of components after prolonged use - hard drives, then power supplies, much less - power distribution boards, backplanes, memory modules. It is extremely rare - hidden motherboards, video adapters and RAID controllers.
The overwhelming majority of warranty claims fall on the last, third year of the product warranty period. The percentage of warranty claims is extremely small, but in quantitative terms, given the volume of servers produced, it is very significant. And this is despite all the measures of control and verification.
Therefore, the level of warranty service is one of the main advantages of a quality manufacturer over the so-called. "Self-assembler". By purchasing a “self-assembled” server, the user often gets a collector in the form of a “layer” between the end user and the component manufacturer. Even if the client has the opportunity to contact the assembler directly under warranty, the problem is still transmitted to the manufacturer, and this significantly increases the reaction time. Examination and replacement is often done by the component manufacturer, and the client has to wait until all the procedures are completed.
Our company, as a self-respecting server manufacturer, provides warranty service on its own.

Features of our warranty:

The term of warranty service is 3 years . This is the standard minimum warranty service period at a service center that applies to all server components. Even on optical drives, fans and batteries of RAID controllers, where the warranty period stated by the manufacturer of these components does not usually exceed one year, and in some cases is 6 months. However, a three-year warranty provides the entire product.

Service centers in all major cities of Russia. More than 70 service centers service the STSS Flagman equipment under warranty and allow to solve simple issues and problems of average complexity on the spot.

Fault diagnosis by own service department. First: it allows you to shorten the reaction time. Secondly: the product is diagnosed in the assembly, and not the allegedly faulty component - this allows you to avoid replacing the working component with a working one, and reduce the repair time. But if the engineer remotely accurately determined the malfunction according to the data received from the client, it is possible to proactively replace the faulty component, mainly applicable to disks, memory and hot-swappable power supplies.

Prompt replacement. If a warranty case is detected, the replacement is made from our own warehouse. Interaction with the manufacturer of components on the subject of replacement remain “off-screen” for the end user. Our guarantee allows to avoid red tape according to the following scheme:
Client -> Collector -> Component Manufacturer -> Collector -> Client
Our warranty service scheme is more comfortable for the client:
Client -> Server Manufacturer -> Client

Extended warranty service. Warranty plans with extended service life, shortened reaction time and fault correction allow the client to choose the optimal level of insurance against long equipment downtime.

Technical presale


Not all of our clients are able to determine the potential load on the server, taking into account the features of the software and the tasks performed.
If the client knows the operating conditions, the type and level of planned tasks for the server or storage, but does not know how to compare this with the required hardware configuration - our pre-engineer selects the necessary configuration.
The calculation is based on the following parameters:
1. Based on the type and level of complexity of the planned tasks, the optimal configuration of the computing, disk and graphics subsystems, the type and performance of the network controllers are determined.
2. Given the need to scale the system in the future, the platform is chosen with a margin for expansion. Depending on the task, free disk bays, a more powerful power supply, free slots for PCI-E devices and RAM are laid. In some cases, the customer needs a free socket to increase computing power in the future.
3. Understanding the requirements for server fault tolerance, the RAID level is selected, double power supplies and other fault tolerance elements are laid. The possibilities of redundancy of SAS-expanders and RAID-controllers are considered, and in some cases, when the highest availability is needed, a cluster is designed without a single point of failure.
4. Depending on the severity level of server failure and the cost of business process downtime, an optimal warranty plan is selected to minimize customer costs in the event of server failure.

In appearance, it may seem that it is not difficult, but in reality the process of selecting a truly optimal configuration for specific tasks is not very simple. It is necessary to evaluate what parameters of the hardware resources the software uses. What is more important in this case - the frequency of the processor or the number of cores? Canal, frequency or amount of memory? IOPs or MB / s? Video memory capacity or video processor power?
Identifying and eliminating the bottlenecks of the system allows you to build a balanced system that will perform the customer’s tasks with maximum efficiency.
If the task is not trivial, and the pre-engineer cannot immediately design a solution, there remains a test in real conditions on real problems with an analysis of the results. This can be done both in our laboratory and at the customer site.

Grocery marketing


The task of product marketing is aimed at offering the client not just a list of components in a given platform, but a ready software and hardware solution for specific tasks. Servers, storage systems and workstations are classified and positioned not only by their technical characteristics, but also by functions and role assignments.
Moreover, these products are balanced, tested and optimized for specific tasks of the user.
An example of such solutions could be servers for video surveillance or video conferencing, virtualization hosts , graphical farms and fault tolerant clusters. All this allows the customer to quickly determine the configuration of the solution and not “reinvent the wheel”

Why our approach is in demand in the market?


Has anyone wondered why we all, in principle, use paid services? Not even those that are unique, but those that could be replaced by their own resources.
For example: delivery, car wash, car service, various agencies (travel services, organization of holidays and events, etc.), laundry, canteen catering. This list is endless.
The same situation is observed in the field of business. There are companies that use third-party services of almost all categories that are not related to the direct business process and the company's vector. An example of this can be the following services: security companies, cleaning companies, IT outsourcing, personnel agencies. Why do companies not pay small money for such services and do not try to perform them on their own? Why do organizations rent warehouses, data centers and use the services of logistics companies? Are they not able to realize all this on their own?
As a rule, the leaders of such enterprises are well able to count money.
Why inflate the staff of workers and, as a result, the managers of these workers, conduct training, expand office space, when you can use the services of a specialized company that will do everything efficiently, quickly and, most likely, for less money? At the same time, the company continues to work and earn by what it specializes for, for which it was created.
I see two main reasons why organizations outsource:
1. Money. The costs for the comprehensive development and maintenance of the company's infrastructure either do not pay off, or pay off for a very long time.
2. And money. The time and resources released by using outsourcing can make more money than the cost of it.
A competent financier and CIO will tell you that OPEX is almost always better than CAPEX

Our customers belong to various industries and segments of Russian business and government entities. Some have an IT department, others do not. Some have the necessary competence in the field of server hardware, others do not.
There are customers who can realize most of our services on their own, but they come to us. Why?
Because appealing to us, they get a finished certified product with a guarantee of quality. Avoiding all of the above difficulties they may encounter, our customers focus their efforts on the company's core business, be it software development or system integration, trading or manufacturing.

Conclusion


I hope I managed to draw a more or less formalized distinction between manufacturers and “self-assemblers” on the example of our company. We continue to develop and expand the range of services, while maintaining the main vector - servers, storage systems, integrated infrastructure solutions. At the moment we are completing the reorganization of the demo zone in the Moscow office. It will demonstrate solutions for virtualization and clustering. Deployed video surveillance systems of major domestic developers. Solutions for active noise reduction for server cabinets.
Our demo zone is open to customers with real-life tasks that require performance testing of a real software and hardware solution. After the reorganization, access will be open for testing from outside and demonstrating the work of solutions remotely.

Thank you for your attention, waiting for your comments and responses in the survey!

Source: https://habr.com/ru/post/271481/


All Articles