Invented by G. Glenn Henry, Terry Parks, Darius D. Gaskins, Via Technologies Inc
The Market For Dynamic Reconfiguration Multi-Core Processors
Manufacturers such as Intel and AMD have created multi-core designs in order to provide regular performance improvements for general-purpose processors. These systems consist of multiple CPUs on one chip, each offering a specific type of performance and power efficiency.
These designs enable a workload to select the optimal core for performance and power efficiency, often involving migration of execution across cores. Unfortunately, this incurs an extensive performance- and energy-related overhead which limits execution frequency.
The market for Dynamic reconfiguration multi-core processors is expected to experience rapid growth over the coming years, as these types of processors enable users to perform multiple tasks at once. They are used in a range of applications such as desktops, mobile PCs, Smart phones, tablets and servers alike.
A major factor driving the growth of the Dynamic Reconfiguration Multi-core Processors market is the surge in popularity of smartphones with these types of processors. These smartphones can handle graphically intensive games and high resolution videos without draining their battery, allowing users to utilize them as primary entertainment sources.
These processors boast multiple cores running at different speeds, providing more processing power and performance than standard CPUs. These chips come in various forms such as dual, quad, or octa cores for added versatility.
One of the most popular applications for these processors is image processing technology. They are capable of running complex algorithms at high speeds, which can significantly enhance image processing results.
Additionally, processors can also be utilized for audio and video processing. As a result, these devices are expected to experience rapid growth in the coming years as they gain wider acceptance and become increasingly sought-after among consumers.
Another key factor driving demand for Dynamic reconfiguration multi-core processing is an increasing need for powerful computing devices with extensive functionality. This is owing to their capability of performing complex tasks at high speed, leading to increased productivity across various industries.
Furthermore, these processors can also be utilized for video and image processing. These devices are capable of performing complex algorithms at scalability speeds and offer improved results across a wide range of applications.
However, one major drawback of these processors is their high cost; this could restrict their market expansion.
To counter this limitation, dynamic reconfiguration technology can be utilized to implement micro architectural heterogeneity within the cores of these processors. This helps reduce migration overhead and allows for fine grain switching.
The market for Dynamic Reconfiguration Multi-core Processors is expected to expand rapidly due to rising demand for high performance computing devices. These processors can execute individual program instructions on each core simultaneously, increasing functionality and speed. They find applications in desktops, mobile PCs, Smart phones, tablets, servers, and workstations alike.
Furthermore, they can be employed in numerous industries like healthcare and automotive. Furthermore, they possess the capability to manage complex tasks such as video editing, 3D gaming, and encoding.
Dynamic reconfiguration multi-cores are fueling the market due to their low power dissipation. This technology is especially advantageous for computers and mobile devices with large screens since they use less energy and offer faster performance compared to single-core chips.
Another factor fuelling this segment’s growth is their widespread adoption by tablet and computer makers such as Xiaomi, Qualcomm, and Samsung. These companies have made multi-core processors popular for smartphones and laptops so that they can handle complex tasks like managing graphic-intensive games and videos without draining battery life.
However, these processors have limitations. They cannot run at double the speed of a single-core CPU and they cost more compared to single-core CPUs; this could hinder their market growth.
Thus, processors are primarily utilized in consumer electronics and healthcare sectors. They have found applications such as medical diagnostics, industrial controls, and video processing.
Multi-core processors are becoming more and more prevalent in applications and heterogeneous computing setups, as combining a powerful core with a small one can provide greater functionality for various uses and use cases.
Dynamic reconfiguration multi-core processors (DRMPCs) are being adopted by more and more companies due to their advantages such as enhanced performance, higher energy efficiency and lower cost. They’re being utilized in industries like consumer electronics, aerospace and automotive with notable reputations for superior reliability and performance.
The market for Dynamic Reconfiguration Multi-Core Processors is anticipated to be driven by a need for enhanced performance and efficiency in devices, as well as rising smartphone sales which should further fuel its growth over the coming years.
Additionally, the market for Dynamic Reconfiguration Multi-Core Processing is being driven by an increase in avionics systems. These processors boast higher performance and functionality which improve system efficiency while decreasing costs. Unfortunately, managing their manufacturing process can prove challenging, leading to lower chip production yields.
Therefore, manufacturers are investing in multi-core processors to meet the demands of various industries. This trend is expected to persist into the foreseeable future.
Avionics systems rely on these processors for tasks such as radar and navigation, necessitating them to have high reliability and security levels.
These systems are being employed to boost system efficiency and reduce power consumption. Compared to single-core processors, multi-core processors provide improved performance and can handle more and heavier applications simultaneously.
Dynamic voltage and frequency scaling (DVFS) is one way to accomplish this goal. DVFS, an emerging trend in multi-core processor markets, allows for improved power efficiency by increasing clock speeds or decreasing operating temperatures.
The technology can also be employed to reduce the power consumption of interrupt vectors. When transitioning from a parked state to an active one, interrupt vectors are remapped to cores that are running.
Interrupt vectors can be collapsed and reconfigured through threshold detection operations, which may be based on real-time workload measurements or other considerations.
Further, remapping interrupt vectors involves decreasing their frequency using either a scaling operation or fan-out operation.
When it comes to energy consumption, granularity of reconfigurable logic plays an integral role. The more coarse-grained the architecture, the less information is transferred and thus requires less power consumption to operate.
The market for dynamic reconfiguration multi-core processors is expected to expand rapidly over the coming years due to an increasing need for high-performance computing devices with extensive functionality. Furthermore, rising demands for advanced technologies within industry and increased utilization of these processors in automotive applications are further fueling growth within this sector.
The global dynamic reconfiguration multi-core processors market is forecast to reach USD 2.22 billion by 2029. Asia Pacific region is projected to dominate this space with a revenue share of xx% by then, due to developing economies such as China and India driving growth during this forecast period.
Companies and organizations often opt for multi-core processors in their products and applications due to their energy efficiency compared to single core processors, increased application performance, and reduced hardware costs.
In addition to the advantages mentioned, multi-core processors also come with some drawbacks. They may not operate as quickly as a standard CPU and may cost more than single core models. Furthermore, operating a multi-core chip requires additional resources in terms of power consumption, cooling requirements and data synchronization – leading to additional expenses.
Unfortunately, some users may struggle to take advantage of a multi-core processor in their software applications. This is because multi-core processors are not designed for easy emulation by standard PCs or workstations; rather, separate operating systems and application software must be developed in order to fully exploit their cores.
Despite these drawbacks, the market for dynamic reconfiguration multi-core processors is projected to experience rapid growth over the coming years. A variety of industries are anticipated to incorporate this technology into their products and applications in an effort to boost efficiency and reduce costs. Furthermore, an increasing number of people using mobile devices such as smart phones and tablets is fueling demand for these items.
The Via Technologies Inc invention works as followsA microprocessor is composed of a plurality processing cores and a configuration registry that indicates whether each processing core is enabled or disabled. Each enabled one of the plurality of processing cores is configured to read the configuration register in a first instance to determine which of the plurality of processing cores is enabled or disabled and generate a respective configuration-related value based on the read of the configuration register in the first instance. A modification to the configuration register indicates that one of the plurality is disabled. Each enabled one of the plurality of processing cores is configured to read the configuration register in a second instance to determine which of the plurality of processing cores is enabled or disabled and generate the respective configuration-related value based on the read of the configuration register in the second instance.
Background for Dynamic reconfiguration multi-core processors
Multi-core microprocessors are gaining popularity due to their performance benefits. This is due to the rapid decrease in dimensions of semiconductor devices, which has resulted in an increase in transistor density. Multiple cores within a microprocessor have created the need to communicate between them in order to achieve various features, such as power management and cache management.
Historically, architecture programs (e.g. operating system or application program) that run on multi-core processors communicated with each other using semaphores in a system memory that is architecturally addressable to all cores. While this may be sufficient for some purposes, it may not be sufficient for others.
The present invention provides a microprocessor in one aspect. The microprocessor is composed of a plurality processing cores and a configuration registry that indicates whether each one of the plurality are enabled or disabled. Each enabled one of the plurality of processing cores is configured to read the configuration register in a first instance to determine which of the plurality of processing cores is enabled or disabled and generate a respective configuration-related value based on the read of the configuration register in the first instance. A modification to the configuration register indicates that one of the plurality is disabled. Each enabled one of the plurality of processing cores is configured to read the configuration register in a second instance to determine which of the plurality of processing cores is enabled or disabled and generate the respective configuration-related value based on the read of the configuration register in the second instance.
In another aspect, this invention provides a method of reconfiguring multi-core microprocessors with multiple processing cores. This method involves reading a configuration register in a primary instance of each of the plurality processing cores to determine which one is enabled or disabled. The method also includes generating, by each enabled one of the plurality of processing cores, a respective configuration-related value based on said reading the configuration register in the first instance. The method also involves updating the configuration register to indicate that one of the plurality is disabled. This method includes the ability to read the configuration register in a second instance by each of the plurality processing cores. This allows you to identify which one of the plurality is disabled or enabled. The method also includes generating, by each enabled one of the plurality of processing cores, the respective configuration-related value based on said reading the configuration register in the second instance.
Another aspect of the invention is that it provides a computer program program product encoded on at least one nontransitory computer useable medium for use in a computing device. The computer program product comprises computer usable code embedded in said medium for specifying microprocessors. The computer-usable program code contains the first program code to specify a plurality processing cores. The computer-usable program code also includes second program code to specify a configuration register. This register is used to indicate whether each one of the plurality are enabled or disabled. Each enabled one of the plurality of processing cores is configured to read the configuration register in a first instance to determine which of the plurality of processing cores is enabled or disabled and generate a respective configuration-related value based on the read of the configuration register in the first instance. A modification to the configuration register indicates that one of the plurality is disabled. Each enabled one of the plurality of processing cores is configured to read the configuration register in a second instance to determine which of the plurality of processing cores is enabled or disabled and generate the respective configuration-related value based on the read of the configuration register in the second instance.
Referring to FIG. “Referring now to FIG. 1, a block diagram depicting a multi-core processor 100 is shown. A plurality of processing cores are denoted 102A through 102B through 100N. They are collectively referred as cores 101 or cores 102. Cores 102 and cores 102 are also referred to as cores 102 or cores 102. Cores 102 and core 102 are each referred individually to as core 102 or core 102. Each core 102 is preferred to include one or more functional units (not illustrated), which may include an instruction cache, translation unit or instruction decoder. Preferably, each core includes a register renaming unit and reservation stations. It also includes data caches, execution, memory subsystems, retire units with a reorder buffer. The cores 102 should include a superscalar out-of-order execution microarchitecture. The microprocessor 100 in one embodiment is an x86 architecture microprocessor. However, other embodiments are possible where the microprocessor100 conforms to another instruction-set architecture.
The microprocessor 100 also contains an uncore section 103 that is coupled to cores 101 and that is separate from cores 102. The uncore103 contains a control unit (104), fuses (114), a private random address memory (PRAM), 116 and a shared memory 119. This shared cache memory can be used to store level-2 (L2) or level-3 (L3) cache memories, which are shared by cores 102. Each core 102 can read/write data to the uncore103 via an address/data bus (126), which provides a nonarchitectural space (also known as private or microarchitectural space) to the shared resources of uncore103. In the sense that it’s not within the architectural user program address area of the microprocessor 100, the PRAM 116 can be considered private or non-architectural. One embodiment of the uncore103 contains arbitration logic, which arbitrates requests from cores 102 to access uncore103 resources.
Each of the fuses114 can be blown or not. The fuse 114, when it is not blown has low impedance, and conducts electricity easily. The fuse 114, on the other hand, has high impedance, and doesn’t conduct electrical current well. Each fuse 114 has a sense circuit that can be used to assess the fuse 114. This allows the sense circuit to determine whether the fuse conducts high current or low voltage (not blow, e.g. logical zero, clear), or low current or high potential (blown, either logical one or set). In some embodiments, the fuse 114 can be blown during the manufacture of microprocessor 100. However, the fuse 114 may also be blown following manufacture of microprocessor100. The fuse 114 should not be blown again. A fuse 114 may be a polysilicon fuse. It can be blown using a sufficient high voltage across it. A fuse 114 may also be made using a laser to blow a nickel-chromium fuse. At power up, the sense circuit detects the fuse 112 and reports its evaluation to the microprocessor 100. The cores 102, e.g., microcode, read the holding registers and determine the values of the sensed fuse 112 when the microprocessor 100 has been released from reset. One embodiment allows for updated values to be scanned into the holding registers before the microprocessor 100 is released from reset. This will update the fuse 114 values. This is especially useful for testing and/or debugging purposes, as shown in FIGS. 22 and 23.
Additionally, in one embodiment the microprocessor 100 contains a different local Advanced Programmable Interrupt Controllers (APICs) that are associated with each core 101. One embodiment conforms architecturally to the description in the Intel 64 Architectures Software Developer’s Manual Volume 3A, May 2012 by the Intel Corporation of Santa Clara (Calif.), particularly section 10.4. The local APIC also includes an APIC ID registry that includes an APICID and an APIC base Register that includes a bootstrap process (BSP) flag. These generation and uses will be described below with particular reference to FIGS. 14-16 and FIGS. 22 and 23.
The control unit104 includes hardware, software, and a combination thereof. The control unit104 also includes a hardware seiphore 118, which is described in detail below with regard to FIGS. 17-20), a status register, a configuration register 112, as well as a sync register for each core 102. Each of the core 103 entities should be addressable by each core 102 at a unique address in the non-architectural space that allows microcode to read or write it.
Each sync registry 108 can be written by its core 102. Each core 102 can read the status register 106. The configuration register 112 can be read and written indirectly (via FIG. 2., as described below), by each core 102. The interrupt logic not shown in the control unit is preferred to generate an interrupt signal (INTR), 124 for each core 102. This interrupt signal is generated by the control unit to interrupt core 102. The interrupt sources that the control unit 104 generates an interruption 124 to a core 102 could include external interrupt sources such as the x86-style bus signal STPCLK, SMI, NMI interrupt source, or bus events such as assertion or de-assertion. Each core 102 can send an intercore interrupt 124 to each other core 102 by writing the control unit104. If not otherwise specified, inter-core interruptions described herein are non-architectural interrupts that are requested via microcode of core 102 by using a microinstruction. They are different from architectural inter-core interrupts that system software requests via architectural instructions. The control unit 104 can also generate an interrupt (124) to the cores102 (a sync interruption), when a sync condition (e.g., FIG. 21 and block 334 in FIG. 3). 3. The control unit104 also generates a core power control signal (PWR), 128 to each core102. This signals controls whether the core 102 is receiving any power. The control unit 104 can turn off power to core 102 using the PWR signal 128. This will put core 102 into a deeper sleep and then turn back on power to core 102 to wake it up.
A core102 may write to its sync register108 with the synchronization bits set (see FIG. Sbit 222). 2), also known as a sync request or synchronization request. In one embodiment, the sync request asks the control unit (104) to wake the core 102 when a sync condition or a specified wakeup event occurs. Sync conditions are when all enabled bits (see FIG. 254 for more information) are present. 2) Cores 102 and 102 of the microprocessor 100?or an especified subset of enabled cores (see FIG. 2 core set fields 228, 2 core set fields 228)?have written the identical sync condition (specified using a combination C bit 224 and sync condition/C-state field 226, as well as core set field 228 in FIG. 2. (described in greater detail below with regard to the S bit 222) to respective sync registers 108. The control unit 104 wakes all cores 102 which are waiting for the sync conditions, i.e. those that have requested it. Alternately, cores 102 may request that only the core 102 that has written the sync request be awakened. (see sel wake bit 214, FIG. 2). Another embodiment of the sync request doesn’t request that the core 102 be put to sleep. Instead, it requests that the control unit (104) interrupt cores 102 when the sync conditions occurs. This is described more in detail below with regard to FIGS. 3, 21.
Preferably, the control unit104 detects that there is a sync condition (due to last core102 writing the sync request, register 108), and the control unit104 puts the last core102 to sleep. It then turns off clock 122 to last-writing core102 and wakes up all cores102 simultaneously, i.e. turns on clocks 122 for all cores102. All cores 102 will be awakened in this way, with their clocks 122 being turned on on exactly the same clock cycle. This can be especially advantageous in certain operations such as debugging. (See FIG. 5, in which the cores 102 wake up on the exact same clock cycle. One embodiment of the uncore103 contains a single phase locked loop (PLL), which produces the clock signals 122, provided to cores 102. Other embodiments include multiple PLLs in the microprocessor 100 that produce the clock signals 122 for the cores 102.
Control, Status, and Configuration Words
Referring to FIG. 2 shows a block diagram that illustrates a control word, status word, and configuration word. A core 102 writes the value of control word 202 in the sync register 110 of the control unit (104 of FIG. 1 to request an atomic request to be synchronized (sync with) all other cores 102 or a subset thereof of the microprocessor 100. To determine the status information herein, a core 102 reads the value of the status term 242 from the status register.106. Core 102 reads the configuration word 252 value from the control unit 104’s configuration register 112 and uses it as described below.
The control word 200 includes a wakeup events fields 204, a sync field 206 and a power gate bit (PG) bit208. The sync control field206 contains various bits or subfields which control the sleep of the core102 and/or the syncing between the core102 and other cores102. The sync control area 206 contains a sleep bit 212, a selective wakeup bit 214 and an S bit 222, a C bit 228, a sync condition field 226, a C-state field 228, a core set field 228, a core set bit 228, a force syc bit 223, a selective kill bit 234, and core disable bit 236. Status word 242 contains a wakeup events (244) field, a C-state lowest common field 246, and an error field 248. Configuration word 252 contains one enabled bit 254 per core 102 of microprocessor 100, as well as a local core number field (256) and a number field (258).
The control word 202’s wakeup events field, 204 contains a number of bits that correspond to different events. The core 102 can set a bit in the Wakeup Events field 204 to wake up the control unit (i.e. turn on the clock 122 to) the core 102 when the event that corresponds to the bit occurs. A wakeup event is when the core102 has synced to all other cores in the core set field 228, One embodiment may include all cores 100 and 102, as well as all cores 102 sharing a cache memory (e.g. an L2 cache or L3 cache) with instant core 102, all cores 101 on the same semiconductor die, and all cores 022 on the same silicon die as instant core 102. (See FIG. 4. for an example embodiment that describes a multicore, multi-die microprocessor 100; all cores 102 that share a cache memory (e.g. an L2 cache or L3 cache) with the instant core 101; all cores 102 on the same semiconductor die as the instant-core 102 (see FIG. Slices are the combination of cores 102 and cache memories. Another example of wakeup events is an x86 INTR or SMI, NMI assertion or de-assertion STPCLK and an inter-core interrupt. Core 102 may be awakened by reading the wakeup event field 244 in status word 242 in order to determine active wakeup events.
If the core 102 sets PG bit 208 then the control unit 104 shuts off power to core 102 (e.g. via the PWR signal 128) immediately after it has put the core 102 into sleep. The control unit104 clears the bit 208 when it restores power to core 102. The use of the PGbit 208 is explained in detail in FIGS. 11-13
The control unit 104 sets the sleep bit 212, or the selwake bit 214. If the core102 does this, the control unit 110 puts the core102 to sleep. After the core102 writes the sync register108 using the wakeup event field 204, the core102 goes to sleep. The sel wake bit 214, and sleep bit 212 are mutually exclusive. They differ in the way the control unit (104) reacts to a sync condition. The control unit 104 will wake all cores 102 if a core 102 sets a sleep bit 212 when there is a sync condition. Contrary to this, if a core102 sets the sleep bit 214, the control unit104 will wake only the core 102 that wrote sync condition to its sync register108.
If the core102 does not set the sleep bit 212, nor the selwoke bit 214, but the control unit104 will not put it to sleep, and therefore will not wakeup the core102 when a sync situation occurs, the control 104 will still set the bit in wakeup events field 242 that indicates that a sync conditions is active so the core102 can detect that a sync has occurred. Many of the events in the wakeup event field 204 could also be interrupt sources, for which the control device 104 can create an interrupt to core 102. If desired, however, the interrupt sources may be hidden by the microcode of core 102. The core 102 may be woken up by the microcode to read the status register (106), in order to determine if a sync condition or wakeup event has occurred.
If the core102 sets the Sbit 222, it asks the control unit104 to sync on the sync condition. The C bit 224, sync conditions or C-state fields 226, and core setting field 228 specify the sync condition. C bit 224, if set, specifies a C state value. C bit 226, if clear, specifies a C condition. The sync condition field 226 indicates a non-C state sync condition. Preferably, the C-state field 226 or sync condition values are a bound set of non-negative integers. One embodiment of the sync condition, or C-state 226 comprises four bits. If the C bit 224, which is the clear value, indicates that a sync situation has occurred when all cores 102 of a given core set 228 have written the respective sync registers 108 with the Sbit 222 set and the same value for the sync condition 226. One embodiment of the sync condition field 226, values correspond to unique sync situations, such as the various sync cases described in the following exemplary embodiments. A sync condition is when all cores 102 of a specified core group 228 have written their respective syncregister 108 with the Sbit 222 set. This applies regardless of whether or not they have written the C-state value 226. The control unit 104 writes the C-state value 226 to the status register 246 at the lowest common C246. This can be read by cores 102 (e.g. by master core 102 or selectively awakened core 102/block 1108). If the core 102 specifies a value (e.g. all bits set) in C-state field 226, the control unit instructs it 104 to match the instant 102 with any sync field 226 value specified for other cores.Click here to view the patent on Google Patents.