Invented by Ramanjaneyulu Talla, Narendra Kumar Kataria, Citrix Systems Inc

The market for systems and methods of emulating a NIC for packet transmission on hardware RSS aware NICs in a multi-core system is rapidly growing as the demand for high-performance networking solutions continues to rise. With the increasing adoption of cloud computing, virtualization, and data-intensive applications, there is a need for efficient and scalable network infrastructure that can handle the ever-increasing data traffic. A NIC (Network Interface Card) is a hardware component that connects a computer to a network. It enables the computer to send and receive data packets over the network. RSS (Receive Side Scaling) is a technology that allows a multi-core system to distribute incoming network traffic across multiple processor cores, thereby improving network performance. Emulating a NIC for packet transmission on hardware RSS aware NICs in a multi-core system involves creating a software layer that replicates the functionality of a physical NIC. This emulation layer intercepts network packets, processes them, and distributes them across multiple processor cores using RSS. This approach enables efficient utilization of the available processing power and improves overall network performance. The market for these systems and methods is driven by several factors. Firstly, the increasing demand for high-speed networking solutions in data centers and enterprise networks is fueling the need for efficient packet transmission mechanisms. Emulating a NIC on hardware RSS aware NICs allows for better utilization of the available processing power, resulting in improved network performance and reduced latency. Secondly, the growing adoption of virtualization technologies is also contributing to the market growth. Virtualization allows multiple virtual machines to run on a single physical server, sharing the available resources. Emulating a NIC on hardware RSS aware NICs enables efficient packet transmission between virtual machines, ensuring optimal network performance in virtualized environments. Furthermore, the rise of data-intensive applications such as big data analytics, machine learning, and video streaming is driving the need for high-bandwidth networking solutions. Emulating a NIC on hardware RSS aware NICs allows for efficient packet transmission, enabling these applications to process and transfer large volumes of data without bottlenecks. The market for systems and methods of emulating a NIC for packet transmission on hardware RSS aware NICs in a multi-core system is highly competitive. Several companies are offering innovative solutions in this space, including software-based virtual NICs, hardware-accelerated NIC emulation, and intelligent packet processing algorithms. In conclusion, the market for systems and methods of emulating a NIC for packet transmission on hardware RSS aware NICs in a multi-core system is witnessing significant growth due to the increasing demand for high-performance networking solutions. Emulating a NIC allows for efficient packet transmission, improved network performance, and better utilization of available processing power. As the demand for high-speed networking continues to rise, the market for these solutions is expected to expand further, driven by the adoption of virtualization technologies and data-intensive applications.

The Citrix Systems Inc invention works as follows

The “Emulating a NIC” for packet transmissions on hardware RSS unaware NICs allows each of a multitude of slave packet engines in a multicore system to emulate a NIC locally for packet transmittals, even though the actual NIC packet transmissions are handled only by a master engine. Each slave packet engine uses a software-implemented local transmission queue to track data in the device output queu, which is handled by the master engine on behalf the slave engines. The master packet engine may transmit data from the queue, and as the status changes of the queue, the master and slave packet engines can use pointers to track which data is transmitted, which packets have been drained, and which packets remain in the queue.

Background for Systems and Methods of Emulating a NIC For Packet Transmission On Hardware RSS Aware NICS In A Multi-Core System

In a multicore appliance deployed between clients/servers, data packets traveling through the appliance can be handled by any number cores in the system. Cores can process data packets, and then send them to their destinations. Normaly, certain cores may not have information on the transmission of data packets after the packets have left the cores. In certain embodiments, some cores will implement the transmission of data packets while others may not be aware of the status of data processed. To maintain information about the processed data, every core can make one or multiple copies of the data.

The present application is directed at systems and methods to emulate a NIC in order to transmit packets on hardware RSS-aware NICs within a multicore system. The packet engine on a multicore system may not know the status of data packets when it processes network packets, sends them to a queue and then transmits the packets. This is because the master core handles the transmission of these packets. To keep track of the number of packets that have been processed, a core other than the master may need to make copies of all data. This core will not be able to know its own transmission statistics because the master packet engine of the master core is the only one that can update and monitor the NIC total packets or bytes sent.

The systems and methods described herein allow each packet engine to emulate a NIC locally for packet transmissions, even though actual NIC transmissions are only handled by a single master core. The systems and methods described herein allow each packet engine the ability to use a locally implemented software transmission queue for device queues and to track the status of each field of data in the device output queue. The master packet engine will handle the actual device output queue on behalf of slave packet engines. Each packet engine can implement a local representation or version of the device output queue. The master packet engine may transmit data from the device queue, and as the queue status changes, both the master and slave packet engines can use pointers to track which data is transmitted, which packets have been drained, and which packets remain in the queue. These techniques allow each slave packet engine to keep track of the status of data sent by the slave engine and maintain their own statistics, even though the master engine is responsible for the transmissions. Using these methods, each packet on each core of the multi-core devices can emulate the status data transmitted over a NIC which is normally controlled and handled by the packet engines on the master core alone.

An interface-master operating on a Master packet engine can read descriptors put by a slave core’s packet engine and send the information out from the queue according to the descriptors that were read. In this case, the master packet engine does not have to consult the fields of the shared data structures, also known as NSBs, to process information. The master packet engine could use the information that was processed by slave packet engines in order to send the data out of the queue. The slave packet engines can use information from the slave PEs in order to modify software descriptors identifying specific NSB field status. The master PE can use the descriptors supplied by the source PEs to transmit data from the queue using the descriptors.

The master PE can maintain and use one of more queues in order to identify the NSBs that are stored in the Hardware Descriptors. This is done for the purpose of processing transaction completions. These queues could contain information mapping HW descriptors and source PEs. The transmission processing function or transmission procedure tx_proc ( ) may do this to update the PE source pointers so that the corresponding PEs can free up the NSB field once the master PE has transmitted data from the freed-up fields in the device queuing.

Pointers such as?rd. Update pointers, such as the pointers?rd? Below are the?wr? These pointers can be updated in batches. To minimize invalidation of descriptors in the cache, queue pointers can be used to indicate the transmission and the transmission done. The software buffer descriptors can be defined so that they each store the necessary information to fill out the device buffer descriptions. A table can be used in one example. This table can be filled by the source PEs that process packets. PE-0 can read the buffer descriptor. It will then copy the values from the device buffer descriptor before sending out the packet. This technique can eliminate the need to share NSBs with PE-0 or other PEs.

In one embodiment, three pointers were used to maintain the transmission descriptor between the master packet engine and the slave packet engines in the multi-core systems. The transmission procedure on any PE may update the buffer descriptor for the next packet and increment the pointer “wrinx”. The PE can maintain a link with the NSB while updating the software descriptor fields. This link may be later used to free up the NSB that was transmitted by the master PE. When the master PE drains packets, it may start from?rdinx?” When master PE drains a packet, the master PE may start from?rdinx? “Identify the packets that have been drained by identifying the quota or, if it comes first, the field.

When a queue is empty in a slave, the queue of buffer descriptors in packet engine PE-1 can include pointers “rdinx” and “wrinx?” “When a queue in a slave PE-1 is empty, the buffer descriptor queue in packet engine PE-1 may include pointers?rdinx? The?wrinx? pointer points to the location where the PE-1 queue starts. If PE-1 processes 15 data packets, and adds the 15 processed packets into the queue, then the pointer “wrinx” will be displayed. If PE-1 processes 15 data packets and adds the 15 processed packets to the queue, the pointer “wrinx” The starting position of the queue may be retained. If the PE-1 master packet engine drains 8 packets and places them in the appliance’s outgoing queue, the PE-0 can mark the first drained packet with the pointer “dirty_pe0”. The first packet that is not drained will be marked with a pointer called?rdinx’, and the last packet that hasn’t been transmitted and is not drained with a pointer called?wrinx. The PE-0 can indicate, when the master PE transmits 5 drained packages, that 5 of the 8 drained packs are sent out of the device’s queue by moving the pointer dirty_pe0. The pointer dirty_pe0? The first drained non-transmitted packet is redirected to the first pointer?rdinx? unchanged. This change may cause the transmission procedure to move the dirty_pe0 pointer to indicate the first non-transmitted packet. Dirty_pe0? The?rdinx? “In the queue for PE-1, reflect those who are in queue for PE-0.

The master packet engine can queue and transmit the data processed by slave packet engines while maintaining statistics about their own processed data. This technique allows the slave PEs emulate the status and transmission of data in the device queue, updating themselves as the status changes.

In one aspect, this disclosure is a method of emulating network interface cards (NICs) by a single core in a multiple-core system. A slave packet processor executed by a core in a multi-core device can process a packet to be transmitted from the device. The slave packet can write the first packet processed to a queue local of the slave engine. In response to writing the first packet into the local queue, the slave engine can increment the write pointer for the slave engine’s local queue. The slave packet engine may be read by a master packet engine that is executed on a second core in the plurality cores of this device. After reading the first processed packet from the slave engine’s local queue, the master packet engine can increment the read pointer in the local queue. The master packet can transmit the first packet processed from the device. The master packet may increment the transmission pointer in the slave packet engine’s local queue, upon transmitting the first packet processed from the device.

In one embodiment, the slave engine writes information about the first package to a shared data structure between the slave engine and master engine. In another embodiment, the method also includes the master packet engine reading information about the first package from the shared datastructure. In a further embodiment, the method also includes the slave engine deleting information about the packet by the shared data structure in response to the master packet engines incrementing the transmission pointer for the local queue.

In some embodiments the slave packet engine processes a second package for transmission by the device. The slave packet engines also writes the second packet processed to the local queue. The slave packet engines increments the write pointer for the local slave queue after writing the second packet into the local slave queue. This is done before the master packet engine reads the first packet processed from the slave queue. In another embodiment, the master packet reads the second packet processed from the local queue. The master packet engine also increments the read pointer in the local queue for the slave engine after reading the second processed packet from its local queue. This is done before transmitting the first processed packet.

In another embodiment, “the method” includes writing the first packet processed by the master package engine to a transmission queuing of the device. The method includes also transmitting the first processed packet from the device’s transmission queue. “In such embodiments the slave packet engine increments the transmission pointer in the local queue upon receiving the first processed packet.

In a further embodiment, a second slave engine is executed by the third core in the plurality cores. The second slave engine writes the third packet into a local queue for the second slave engine. The second slave engine increments a write-pointer for the local queue after writing the third packet into the local queue. The master packet engine also reads the third packet, after the processed first packet, from the local queu of the second slave engine. This is done before transmitting the device’s first slave engine. The master packet engine can also increment a read-pointer in the local queue for the second slave engine after reading the third processed packet.

In some embodiments, a slave packet-engine compiles packet engine-specific statistics based on the read pointer and transmission pointer in the local queue. In other embodiments the method involves performing a congestion-handling protocol by the slave engine based on a detection of a difference between the value of the writepointer and the value of either the readpointer or the transmission pointer in the local queue of slave packet engine.

In a third aspect, this disclosure is about a system that emulates a network interface (NIC) using a core within a multi-core computer system. The system includes a multi-core device. The system includes a slave pack engine that is executed by one of the cores of the plurality, and which has a local queue. This slave packet engine is configured to process a packet before it can be transmitted from the device. The slave engine is configured to write the first packet processed to its local queue, and to increment a write pointer in the slave engine’s local queue, upon writing the first packet processed to the local cache of the engine. The system also includes a master package engine that is executed by the second core in the plurality. The master packet is configured to read the first packet processed from the local queu of the slave engine and increment a read pointer in the localqueue of the slave engine upon reading the first packet processed from the slave engine’s local queue. The master packet is further configured to transmit the first processed packet from device and increment a transmission pointer in the local queue of slave packet engine upon transmitting first processed packet.

In some embodiments, the slave engine for packets is configured to write information about the first package into a shared data structure between the slave engine and master engine. In another embodiment of the system the master packet is configured to read the information from the shared datastructure. In a further embodiment, the slave engine is configured to delete the information about the packet in the shared data structure when the master packet engine increments the transmission pointer for the local queue.

In another embodiment of the system the slave engine is configured to process a second package for transmission, write the processed packet to its local queue, and increment the read pointer for the local cache of the engine slave, in response to the writing of the second packet, prior to reading the first packet processed by the master engine from the queue slave. In another embodiment of the system the master packet is configured to read the second packet processed from the queue of slave packet and to increment the write pointer for the local slave queue engine. This occurs after the second packet has been processed and before the first packet processed by the device.

In some embodiments, the device also includes a transmission queue. In these embodiments, master packet engine can also be configured to write the first packet processed into the transmission queue and then transmit the first packet processed from the transmission queu of the device. In such embodiments the slave packet engine increments the transmission pointer in the local queue upon transmitting the first packet processed from the device’s transmission queue.

In another embodiment, a second slave engine is executed by the third core in the plurality cores of the device. The second slave engine is configured to process a third package for transmission from the devices, write the processed packet to a queue local of the engine and increment a write pointer for the queue local of the engine. In such embodiments the master package engine is configured to read the processed third pack from the queue of second slave engine after reading processed first packets from local queues of first slave engine and before transmitting the firstslave packet engine from device.

In some embodiments, the slave engine packet engine is configured to compile packet engine-specific statistics using the write pointer and read pointer in the local queue. Other embodiments of a system include the slave pack engine performing a congestion-handling protocol on the basis of detecting a different between the value of a write pointer, and the value of either the read pointer or the transmission pointer in the local queue.

Click here to view the patent on Google Patents.