Storage System Design in Graphics Processing Unit (GPU) Based on PCM

时间:2022-10-18 07:29:51

We make a study on the features of memory writing operation in GPU program, in order to have a better optimization of PCM storage system performance. We use simulator to make statistics of different lines of writing operation coefficients in the memory. We can find that the access memory is located in most programs, which can not realize uniform distribution, namely, the access address of writing operation is centralized in the memory.

3. The Storage System Optimization Based on PCM

The special buffer unit design based on PCM: through the above simulation, we can know, GPU program concentrates a lot of writing operation in the few memory lines, then, in order to promote the access to PCM memory to reduce, it can store this data information in the special stored buffer unit, making the execution performance of the program improve. Thus, people need to design the buffer units according to the principles. Because before the executive program, buffer unit doesn’t understand it will frequently visit which memory address, thus, the designed buffer unit can conduct dynamic recognition for the frequently accessed memory line in the program operation process. In addition, because the delay for PCM reading operation is much smaller than the writing operation delay, then people need to store the memory writing operation data in the buffer unit as much as possible. This paper designs a buffer unit put between buffer and PCM, including two parts, respectively data storage unit and data management unit. The line address and data requested by memory access are recorded by data memory unit, and the data access frequency and data replacement strategy in memory unit are managed by data management unit. The access frequency degree of memory address can be conducted by data management unit for statistics, using the two generated lists, and the two lists are respectively recording the active writing access memory line address and inactive line address. The line address in data memory unit is corresponded by the relevant items in the list, and the total lines of data storage unit are equal to the size of two lists. The most frequently accessed memory line is displayed on the left side of the list, and then the most infrequently accessed memory line is displayed on the right side of the list.

Compared with the traditional methods, when this system conducts processing of the access request, its adopted method exists difference for reading operation and writing operation address list. Specially speaking, when cache visit lacks of access memory request, special buffer will conduct processing. We compare the storage address and access request address in buffer, we can find two situations: one is to hit the access request in the access address list, namely, existing the access address in the buffer. Then people can directly obtain the access memory request data in buffer, not needing to go into PCM. If it is reading operation request, it doesn’t need processing list. If it is writing operation request, it needs to combine the relevant rules for operation, namely, if it exists the request in active address, it needs to move an address to the left in the list. If it is stored in inactive list, it needs to replace the access data at the bottom of the active list. People can just move the original bottom address to the right and don’t need to move the address in inactive list. The second situation is that it has not stored the access address in buffer, then the data reading doesn’t need to obtain from PCM, and it needs to get the relevant data in buffer storage. If it has occupied the buffer space, then it needs to replace the original line of data in order to store the new data. In inactive list bottom, it can store the replaced data address, and at the bottom of active list, it stores new data address, then it can move right to the other data address in inactive list.

One is PCM memory: PCM is used as a new storage technology, and it uses the difference between crystalline and amorphous of phase change materials in conducting binary data storage. PCM, compared with DRAM, has a series of advantages, with higher storage density. It is used as the main memory of processor, which can make effective expanding for storage space. In addition, in PCM, data storage is not easy to lose. So, the effective maintenance of data doesn’t need to use other extra energy and the leakage current power consumption and data refreshing power consumption will not produce. From the perspective of data access, two types of memory have roughly the same data reading delay, but compared with DRAM, the data writing-back delay on PCM is much longer, thus, if DRAM is directly replaced as PCM, then the program performance will suffer the data writing-back operation. Two is the main memory based on PCM: in the testing environment, we use the simulator developed by a certain university, and the simulator has the clock precision, thus, it can make effective simulation of current mainstream GPU architecture feature. Also, it can combine simulation requirements; it can flexibly allocate each module and the relevant parameters. In this experience, it conducts simulation of GPU architecture developed by NVIDIA manufacturer. Combined with the relevant data materials, we can understand the number of processors and cache capacity and timing parameters of memory. The DRAM parameters are displayed in the memory timing parameter list. If PCM is selected in the main memory, then people can use PCM timing parameter to replace DRAM timing parameter.

In the aspect of testing procedure, we can together select 8 GPU programs, which are the programs selected in the actual application and used for processing signals and data mining. It uses CUDA programming language to make all testing programs realize.

In the testing result, we use the test to make GPU processor respectively adopt PCM and DRAM as the memory for performance comparison, taking IPC as the main performance testing index. In addition, in the program performance of DRAM configuration, it can summarize the program performance of PCM configuration. Through the simulation result, we can understand that when the main memory selects PCM, it will damage the entire program’s performance to some degree, and the average loss value can achieve 1/3. This performance of the whole system will be influenced by too long writing operation delay.

Compared with traditional cache structure, this paper designs CPU structure, which has a series of advantages, and each line of storage data has greater size. Also, a line of data in the memory can correspond to each line of storage data in the buffer. Thus, in buffer, the line access in buffer can directly reflect that GPU program can have uneven feature for different lines of access. In traditional cache, it distributes the memory line data in many caches, and it cannot directly reflect the features of uneven distribution of memory line access. In addition, the special buffer can make differential processing of reading and writing operation. If the writing operation is too frequent, the priority level of the address in the address list will be increased, and this operation will not influence the priority of address list. Through differential processing, it can store the frequently written data in buffer, and it can adapt to different features of PCM writing and reading delay, making PCM access time reduce effectively and improving the system performance. The traditional cache copes with reading and writing operation according to the same method, but it cannot achieve the above purpose.

4. Conclusion

Through the above analysis, we can know, with the development of the times, the processor has a higher and higher performance requirement for memory. The traditional memory has gradually revealed more and more advantages in the practice process, thus, the new non-volatile memory appears, such as PCM, but it can be applied in graphics processor, and it will have some program loss. For this situation, this paper conducts optimization design and designs the special buffer unit, which has greater advantage and can effectively improve the performance of system operation, so that it is worth of popularization and application. This paper briefly analyzes the storage system design in graphics processor (GPU) based on PCM, hoping to provide some valuable reference suggestions.

References

[1] Zhu Haojie,Han Jungang,Wu Chengmao. Memory management unit design of the GPU command processor[J].Journal of Xi'an University of Post and Telecommunications,2013,2(1):123-125.

[2] Sun Lei. Analysis and research of NOKIA PCM network management network structure[J].Power System Communication,2008,2(8):43-46.

[3] Zhang Yanjun, Liu Longfei, Liu Wei. Design of general PCM testing instrument based on FPGA[J].Fire and Command Control,2013,2(1):65-68.

[4] Kong Dong. Study on the performance of multiple symbol detection technology in the PCM/FM demodulation[J].Computer CD Software and Application,2011,2(11):144-146.

[5] Huang Xinan, Li Ya, Study on the effect of distance measurement based on PCM-FM system[J].Modern Electronic Technology,2012,2(9):143-145.

[6] Xiao Lingzhi, Pu Lin, Li Tao, Design and implementation of multi core graphics processor heterogeneous storage system[J].The Application of Electronic Technology,2013,2(5):32-35.

[7] Wu Enhua, Liu Youquan, General calculation based on the graphics processing unit (GPU) [J].Journal of Computer Aided Design & Computer Graphics,2004,2(5):134-136.

[8] Ma Anguo, Cheng Yu, Tang Yuxing, Research on memory hierarchy of GPU in heterogeneous systems and load balancing strategy[J].Journal of National Defense Technology University,2009,2(5):198-199.

上一篇:水泥混凝土路面平整度施工的控制 下一篇:试分析语文课堂教学中的情感教育的应用