📜 ⬆️ ⬇️

OpenMCAPI: simultaneous launch of Linux and RTOS on multi-core processors



In the daily practice of the developer of embedded systems, one has to face the need to run two or more diverse operating systems on n-core systems on a chip. This is usually Linux and specialized RTOS. Linux is shouldered by working with heavy protocol stacks, while RTOS also deals with real-time tasks.

One of the main tasks that arises with such an organization of the system is to provide an interaction mechanism, that is, inter-core data exchange. If you are interested in finding out one of the solutions based on the OpenMCAPI open library, scroll through a couple of dozen lines of program code and see the real throughput numbers when using this library, welcome to Cat.

The task of inter-core data exchange is successfully solved through the use of shared memory and inter-core interrupts with the writing of its interaction layer and porting it to various operating systems. To bring such an API to a standardized form, the Multicore Association (MCA) developed and released the first version of the MCAPI (Multicore Communications API) specification, and the second version was soon released.
')
The OpenMCAPI library in question is based on the MCAPI 2.0 specification, developed by Mentor Graphics Corporation and has open source code under the free BSD / GPL license. Source codes can be obtained using the project site , there is also a brief information on launching and porting.

The OpenMCAPI library initially provides the ability to work under Linux OS using a virtual transport or shared memory (but only on mpc85xx and mv78xx0 platforms).

The proposed structure of interaction between Linux and RTOS through OpenMCAPI with the division into abstract levels is as follows (see Figure 1):


Picture. 1. The structure of the interaction of Linux and RTOS through OpenMCAPI

Consider the implementation of the structure on the examples of the source port code for Linux:
  1. MCAPI Generic is an implementation of an external MCAPI API.
  2. OS Layer is part of the MCAPI Generic level, which contains code that is dependent on the operating system. This part is presented in the file libmcapi / mcapi / linux / mcapi_os.c and contains the implementation:
  3. Transport Generic is an abstraction layer that provides a mechanism for working with shared memory at the user space level. It is represented by the files libmcapi / shm / shm.c and libmcapi / shm / linux / shm_os.c and contains the implementation:
  4. OS Specific Driver is represented by a Linux kernel module that provides direct access to hardware from user space. The module is located in the libmcapi / shm / linux / kmod folder and contains the implementation:

To fully understand the mechanism of interaction through a transport that uses shared memory, which is implemented in the OpenMCAPI library, it is necessary to consider the internuclear signaling mechanism and the data structure in shared memory.

Further consideration will be based on the mpc85xx platform ( Freescale's P1020 chip ). Software: Linux kernel version 2.6.35 with patches, which comes with the Freescale development toolkit (SDK) QorIQ_SDK_V1_03 (available for download after registering on their site), RTEMS is used as the real-time operating system (RTOS) get in git-repository by reference git: //git.rtems.org/rtems.git.

In order to implement inter-core signaling, Freescale provides at least two mechanisms:
  1. Interprocessor Interrupts (IPIs) - inter-core interrupts, up to 4 pieces with multicast interrupt support.
  2. Message Interrupts (MSGRs) - inter-core 32-bit messages with generating an interrupt when writing a message to the register, up to 8 pieces.

The opencci library uses the MSGRs mechanism for implementing the OS Specific Driver for this platform.

Consider the data structure contained in shared memory (see Figure 2):


Figure 2. Data structure in shared memory

The area of ​​shared memory by space can be divided into two blocks:

/* SM driver mamagement block */ struct _shm_drv_mgmt_struct_ { shm_lock shm_init_lock; mcapi_uint32_t shm_init_field; struct _shm_route_ shm_routes[CONFIG_SHM_NR_NODES]; struct _shm_buff_desc_q_ shm_queues[CONFIG_SHM_NR_NODES]; struct _shm_buff_mgmt_blk_ shm_buff_mgmt_blk; }; 

The structure contains the following elements:
  1. Global shared memory lock - shm_init_lock, used to delimit n-core access to the shared area.
  2. The variable shm_init_field contains the master initialization completion key, takes the value SHM_INIT_COMPLETE_KEY at the end of initialization.
  3. Shm_routes - routing table with connections of internuclear messages, contains CONFIG_SHM_NR_NODES connections by the number of cores (nodes) participating in the exchange. In our case, 2 nodes.
  4. Shm_queues - message queues with binding to a specific node, contains CONFIG_SHM_NR_NODES. In our case, 2 turns.
  5. Shm_buff_mgmt_blk - the buffer management structure (SHM_BUFFER) in the data area.



Before considering the porting process, it is necessary to provide a flowchart of the low-level communication mechanism through shared memory (the mechanism is implemented in OpenMCAPI, see Fig. 3):



Picture. 3. SDL diagrams of the low-level communication mechanism through shared memory (implemented in OpenMCAPI).

Some explanations for diagrams:

Before proceeding to the description of porting for RTEMS, briefly consider this OS.



RTEMS (Real-Time Executive for Multiprocessor Systems) is an open-source RTOS, a full-featured real-time operating system with support for many open standard application programming interfaces (APIs), POSIX standards, and BSD sockets. It is designed for use in space, medical, network and many other embedded devices. RTEMS contains a wide range of processor architectures such as ARM, PowerPC, Intel, Blackfin, MIPS, Microblaze, etc. It contains a large stack of implemented network protocols, in particular tcp / ip, http, ftp, telnet. Provides standardized access to RTC, NAND, UART and other equipment.

Let's proceed to the process of porting OpenMCAPI. Based on the document located by reference [1] is required:
  1. Implement OS Layer files:
    • libmcapi / mcapi / rtems / mcapi_os.c;
    • libmcapi /include/rtems/mgc_mcapi_impl_os.h.

  2. Implement support for a compatible shared memory transport file:
    • libmcapi / shm / rtems / shm_os.c.

  3. Add recipes to the waf collector that is used for OpenMCAPI.

Since the target platform P1020 (powerpc, 500v2) and porting was carried out on RTOS, where the absence of kernel / user space separation is allowed, there is no need to write:
  1. libmcapi / include / arch / powerpc / atomic.h;
  2. libmcapi / shm / rtems / kmod /.

There is also no need for an implementation of OS Layer, since RTEMS supports POSIX-compatible calls, the files mcapi_os.c and mgc_mcapi_impl_os.h were simply copied from the Linux implementation.

The implementation of the shared memory transport is fully implemented in the file shm_os.c and includes the adaptation of calls from the Transport Generic abstraction layer (the file libmcapi / shm / shm.c) and the implementation of the exchange mechanism through MSGRs.

Functions requiring implementation:

1) mcapi_status_t openmcapi_shm_notify (mcapi_uint32_t unit_id, mcapi_uint32_t node_id) - the function sends a notification to the remote core (s), the implementation is represented by a diagram (see Figure 3). The source code is below:



 /* send notify remote core */ mcapi_status_t openmcapi_shm_notify(mcapi_uint32_t unit_id, mcapi_uint32_t node_id) { mcapi_status_t mcapi_status = MCAPI_SUCCESS; int rc; rc = shm_rtems_notify(unit_id); if (rc) { mcapi_status = MGC_MCAPI_ERR_NOT_CONNECTED; } return mcapi_status; } static inline int shm_rtems_notify(const mcomm_core_t target_core) { struct mcomm_qoriq_data *const data = &mcomm_qoriq_data; /* If the target is the local core, call the interrupt handler directly. */ if (target_core == mcomm_qoriq_cpuid()) { _mcomm_interrupt_handler(NO_IRQ, data); } else { mcomm_qoriq_notify(target_core); } return 0; } /* Wake up the process(es) corresponding to the mailbox(es) which just received * packets. */ static int _mcomm_interrupt_handler(rtems_vector_number irq, struct mcomm_qoriq_data *data) { register int i; void *mbox = data->mbox_mapped; for (i = 0; i < data->nr_mboxes; i++) { int active; switch (data->mbox_size) { case 1: active = readb(mbox); break; case 4: active = readl(mbox); break; default: active = 0; } if (active) { LOG_DEBUG("%s: waking mbox %d\n", __func__, i); (void) rtems_event_send( data->rid, MMCAPI_RX_PENDING_EVENT ); } mbox += data->mbox_stride; } if (irq != NO_IRQ) { mcomm_qoriq_ack(); } return 0; } 


2) mcapi_uint32_t openmcapi_shm_schedunitid (void) - the function returns the number of the current kernel (that is, the kernel that executes this code), implemented trivially by reading the register of the processor. The source code is below:

 /* Get current cpu id */ mcapi_uint32_t openmcapi_shm_schedunitid(void) { return (mcapi_uint32_t) ppc_processor_id(); } 


3) mcapi_status_t openmcapi_shm_os_init (void) - the function creates and starts a low-level data reception flow, implemented by calling the rtems_task_create and rtems_task_start functions. The source code is below:

 /* Now that SM_Mgmt_Blk has been initialized, we can start the RX thread. */ mcapi_status_t openmcapi_shm_os_init(void) { struct mcomm_qoriq_data *const data = &mcomm_qoriq_data; rtems_id id; rtems_status_code sc; if( RTEMS_SELF != data->rid ) { return MCAPI_ERR_GENERAL; } sc = rtems_task_create( rtems_build_name( 'S', 'M', 'C', 'A' ), MMCAPI_RX_TASK_PRIORITY, RTEMS_MINIMUM_STACK_SIZE, RTEMS_DEFAULT_MODES, RTEMS_DEFAULT_ATTRIBUTES, &id); if( RTEMS_SUCCESSFUL != sc ) { return MCAPI_ERR_GENERAL; } /* global save task id */ data->rid = id; sc = rtems_task_start( id, mcapi_receive_thread, 0 ); if( RTEMS_SUCCESSFUL != sc ) { perror( "rtems_task_start\n" ); return MCAPI_ERR_GENERAL; }; return MCAPI_SUCCESS; } static rtems_task mcapi_receive_thread(rtems_task_argument argument) { int rc; do { rc = shm_rtems_wait_notify(MCAPI_Node_ID); if (rc < 0) { perror("shm_rtems_wait_notify"); break; } MCAPI_Lock_RX_Queue(); /* Process the incoming data. */ shm_poll(); MCAPI_Unlock_RX_Queue(0); } while (1); printk("%s exiting!\n", __func__); } static inline int shm_rtems_wait_notify(const mcapi_uint32_t unitId) { rtems_event_set event_out; int ret = 0; while(1) { LOG_DEBUG("mcomm_mbox_pending start\n"); (void) rtems_event_receive( MMCAPI_RX_PENDING_EVENT, RTEMS_DEFAULT_OPTIONS, RTEMS_NO_TIMEOUT, &event_out ); LOG_DEBUG("rtems_event_receive\n"); ret = mcomm_mbox_pending(&mcomm_qoriq_data, (mcomm_mbox_t)unitId); LOG_DEBUG("mcomm_mbox_pending end ret=%d\n", ret); if(ret != 0) { return ret; }; } return 0; } 


4) mcapi_status_t openmcapi_shm_os_finalize (void) - the function stops the low-level data reception flow, implemented by calling the rtems_task_delete function. The source code is below:

 /* Finalize the SM driver OS specific layer. */ mcapi_status_t openmcapi_shm_os_finalize(void) { struct mcomm_qoriq_data *const data = &mcomm_qoriq_data; rtems_id id = data->rid; rtems_status_code sc; sc = rtems_task_delete(id); if( RTEMS_SUCCESSFUL != sc ) { return MCAPI_ERR_GENERAL; } return MCAPI_SUCCESS; } 


5) void * openmcapi_shm_map (void) - the function of preparing and configuring the interface MSGRs, the preparation of shared memory. The source code is below:

 /* full open mcom device and get memory map addres*/ void *openmcapi_shm_map(void) { void *shm; int rc; size_t shm_bytes; // low level init // mcomm_qiroq_probe(); shm_bytes = shm_rtems_read_size(); if (shm_bytes <= 0) { perror("read shared memory size\n"); return NULL; } /* initialized device. */ rc = shm_rtems_init_device(); if (rc < 0) { perror("couldn't initialize device\n"); goto out; } shm = shm_rtems_read_addr(); if (shm == NULL) { perror("mmap shared memory"); goto out; } return shm; out: return NULL; } static size_t shm_rtems_read_size(void) { struct mcomm_qoriq_data *const data = &mcomm_qoriq_data; return (size_t) (data->mem.end - data->mem.start); } static inline int shm_rtems_init_device(void) { struct _shm_drv_mgmt_struct_ *mgmt = NULL; /* xmmm */ return mcomm_dev_initialize(&mcomm_qoriq_data, (uint32_t)&mgmt->shm_queues[0].count, CONFIG_SHM_NR_NODES, sizeof(mgmt->shm_queues[0].count), ((void *)&mgmt->shm_queues[1].count - (void *)&mgmt->shm_queues[0].count)); } static void *shm_rtems_read_addr(void) { struct mcomm_qoriq_data *const data = &mcomm_qoriq_data; return (void*)data->mem.start; } 


6. void openmcapi_shm_unmap (void * shm) - the function closes the MSGRs interface, cancels the use of shared memory. The source code is below:

 /* full close mcom device and revert memory */ void openmcapi_shm_unmap(void *shm) { /* deinitialized device. */ shm_rtems_deinit_device(); // low level deinit // mcomm_qoriq_remove(); } static inline int shm_rtems_deinit_device(void) { return mcomm_dev_finalize(&mcomm_qoriq_data); } 

We should separately consider the implementation of the mcapi_receive_thread low-level receive stream function (see the source code above). When a thread is launched by calling the rtems_event_receive function, it is placed into the event waiting mode (implemented by the event mechanism available in RTEMS). Then, when a start event arrives, sent in the interrupt_handler handler (see Fig. 3, “HW-Receive” diagram), changes in the shared memory area are processed (the internal function openmcapi is called shm_poll ()), with its preliminary blocking, after which flow returns to idle state.

Below are the results obtained from the interaction of Linux and RTEMS through OpenMCAPI. The test bench is a Freescale P1020RDB-PB debug board with a P1020 processor installed (2 cores). Frequencies: core frequency - 800 MHz, DDR2 - 400 MHz, CCB - 400 MHz. On the 0/1 cores, Linux / RTEMS were launched, respectively. The exchange was two-way, the time spent on 10,000 two-way parcels was measured. The test results are summarized in the table:

No
Test description
Time for one package, ms
one
Symmetric packets of 512 bytes
37.5
2
Symmetric packets of size 52430 bytes
121
3
Symmetric 100 kB Packets
346
four
Asymmetric packages with sizes 1k / 100k-linux / rtems
185

From all the above, we can conclude that the OpenMCAPI library provides a worthy version of the implementation of the MCAPI specification, which has a clear source code structure that facilitates porting; illustrative examples of porting (powerpc and arm platforms); free license and performance sufficient for most applications.

[!?] Questions and comments are welcome. They will be answered by the author of the article Ruslan Filipovich, the programmer of the electronics design center Promwad .

Source: https://habr.com/ru/post/186806/


All Articles