This article starts with comparing two discrete MCUs and single-chip dual-core MCUs (taking LPC4350 as an example), and then introduces the basic knowledge and important features of asymmetric dual-core MCUs. Next, it focuses on the concept of inter-core communication and several implementation methods, especially the control/status communication based on the message pool. Then, some important details such as kernel mutual exclusion and initialization process are discussed. Finally, two application models of dual-core task division are proposed, and examples are given respectively.
Background and basic concepts
When developing an MCU application system, if a single MCU cannot meet the requirements of the system, a very common approach is to use two or more MCUs and allocate part of the "miscellaneous work" to another low-end "assistant" MCU to complete. However, the shortcomings of using two MCUs are also obvious, especially in terms of chip and PCB costs, system reliability, and power consumption. In addition, if MCUs with different architectures are used, they will also face the challenge of requiring different development tools and developers. If you change the way of thinking, let the MCU contain two cores, one of which is used for the main control and the other is used for the co-control, and their main control and co-control can be downward compatible and efficient communication in architecture, then in many occasions Downloading can not only maintain the power of the multi-machine system, but also avoid the shortcomings of the multi-machine system.
In fact, this is the characteristic of the "Asymmetric Multi-Processor (AMP)" architecture. AMP is an architecture opposite to "Symmetric Multi-Processor (SMP)", in which each processor has a consistent programming model, and the principle of balance is mainly used when assigning work. The advantage of AMP lies in the fine task division, flexible adaptation to different scenarios, and making the best use of everything to best balance cost, performance and power consumption. In addition, the programming difficulty of AMP is also lower. Therefore, in the MCU application field, AMP is more suitable than SMP.
Compared with independent dual MCUs, the AMP architecture has many advantages. One of the key points is that the cost of adding another core is much lower than adding a separate MCU, especially when the two core architectures are similar, it is even equivalent to adding one or two UARTs to the existing silicon. On the other hand, two cores can have the same main frequency and can access on-chip resources equally through the bus matrix. In the discrete dual MCU solution, the main frequency of the coordinated MCU is often much lower than that of the master, and both parties use low-speed serial links to communicate.
Next, we take the LPC4300 series launched by NXP Semiconductors as an example (especially represented by the LPC4350 model) to briefly introduce AMP MCUs.
Features of asymmetric dual-core MCU
AMP MCUs are generally used in relatively large systems, which have higher requirements for functions and performance. Functionally, more peripherals should be supported. LPC4350 contains 2 high-speed USB, 2 CAN, industrial Ethernet, graphic LCD controller, and SDHC interfaces; plus some unique logic configurable peripherals and many traditional peripherals, it is suitable for industrial control, energy, medical, Development of products in many industries such as audio, vehicle, motor, and surveillance.
Performance improvement is the soul of AMP MCU. The core, memory, and bus architecture have a crucial impact on performance. Figure 1 shows the implementation of LPC4350.
Figure 1: LPC4350 core, memory and bus connection diagram
The first is the choice of the kernel. LPC4350 is based on 32-bit ARM Cortex-M4 and Cortex-M0 cores (hereinafter referred to as M4 and M0), both cores can execute code at a frequency of up to 204MHz. Among them, M4 is known for its signal processing and floating-point computing capabilities. It is capable of many applications that must be met by DSP, and inherits the control capabilities of Cortex-M3; on the other hand, M0 is overwhelming for its cost, energy efficiency and processing capabilities. Advantages are quickly attracting developers to transition from the 8/16-bit architecture. More importantly, M4 is fully backward compatible with M0, and can be developed and debugged using the same set of development tools.
The second is the capacity and organization of the memory. LPC4350 is equipped with up to 264KB on-chip RAM, and these RAMs are divided into 4 groups, each group is connected to a separate bus, not without blocks. If not, there will be a situation where two cores compete to use the same piece of RAM-performance is not as good as using a single core! Furthermore, LPC4350 also has two buses connected to externally expanded parallel and serial memories, so there are a total of 6 independent memory address spaces-LPC4350 has no on-chip flash memory. For models with on-chip flash memory, the on-chip flash memory is also divided into two parts.
Finally, there is the bus architecture. There is an eight-layer bus matrix inside the LPC4350. It is like a set of crossbar switches, which can arbitrarily connect the CPU and numerous slave devices including memory through the bus. Reasonable distribution of the bus connection relationship to avoid multiple master devices (such as CPU and DMA) accessing the same memory or peripherals at the same time, can maximize the parallelism of each data stream, so as to give full play to the advantages of performance.
Inter-core communication
The communication between the cores can be divided into two categories: one is the communication of control and status information, and the other is the data communication. The former generally does not carry data, but often has higher real-time requirements; the latter is mainly various data buffers, usually low real-time requirements but large data volume. Control/status communication has greater versatility and is similar to synchronization between tasks. This type of communication is suitable to be implemented by system software and provide a programming interface. Data communication is often related to specific applications (especially in data structure), and needs to be tailored. When implemented, it is suitable for application software to define various data structures.
The cores communicate through shared RAM, and each core can trigger an interrupt source of the other side, and communicate by preparing data-triggering an interrupt, as shown in Figure 2. Of course, the kernel can also periodically check the status of shared RAM.
Figure 2: Diagram of shared memory communication mode between cores
Next, we introduce the control/status communication scheme based on message queues and message pools.
Message queue: Set up two message queues, one for M4 to send messages to M0, and the other for M0 to send messages to M4. The addresses of the two queues need to be agreed in advance. The queue is a circular queue, which can be realized by using a simple array with reading and writing marks, or by using a linked list structure. The former is simple to implement and has low overhead, but the message can only be of fixed length, which is not convenient to carry other information. In addition, the array must be placed in a continuous position in the shared memory area, which has low flexibility. The implementation based on the linked list uses pointers to link each message. In addition to the common linked list control part, each message can also carry various additional parameters according to the message category, and the message memory can be flexibly allocated by the memory management mechanism of the system software. However, the disadvantage is that it is relatively complicated and expensive. If dynamic memory management is involved, real-time performance will be far inferior to array-based solutions.
Shenzhen MovingComm Technology Co., Ltd. , https://www.movingcommiot.com