Memories have been moving on up for a while now, but there’s still more stacking that could be done to meet the performance requirements of memory-hungry applications, especially artificial intelligence (AI).
But stacking memories does create its own set of challenges, whether it’s stacking cells to create the memory devices themselves or stacking memory devices together within a system to meet application requirements. The impetus for stacking memory is driven in part by a need to reduce how far data has to travel. Specifications, such as Compute Express Link (CXL), are already enabling a variety of memories that can be placed closer to compute, while interconnects are also seen as a key part of addressing AI demands as well as other use cases where latency is a key consideration.
These approaches are in addition to the kinds of memory that are employed. Graphics DDR (GDDR) SDRAM has grown beyond its initial roots as incumbent memory historically used primarily in graphics cards for high-end PCs. It has come to occupy a “Goldilocks zone” of sorts because it offers the speed and high performance necessary for AI, autonomous vehicles, and 5G networking. It’s faster than DRAM but still slower than pricier high bandwidth memory (HBM). NAND flash, meanwhile, can also be used in some AI applications, but it’s going to be slower than DRAM.
Today’s NAND flash memory, of course, is 3D, and represents one aspect of stacking that helps to address the memory demands of today’s applications. HBM, meanwhile, vertically stacks memory chips on top of one another and connects them through through-silicon vias (TSVs) and microbumps. Using TSVs, however, dramatically increases manufacturing costs, which is why HBM remains a premium memory solution, even as AI and hyperscalers have opened the floodgates for memory demand, including HBM.
As data movement has begun to dominate certain phases of AI applications, the memory interconnects are becoming increasingly important. However, increasing data rates are challenging even as the expectation is that not only will the speeds of data movement keep doubling but so will power efficiency. But as Rambus fellow and distinguished inventor, Steve Woo, points out, many of the techniques that have been relied on are no longer available or they’re slowing down, which is why new architectures and new ways of moving data must be explored.
He said Rambus has seen a lot of interest in 3D stacking, but without an increase in the bandwidth commensurate with the increased capacity of the stack, there are limits to usability that need to be solved. “We’ve been doing some work to look at things like HBM memory and actual memory systems that use HBM.” That includes getting a better understanding of where all the power is being spent, which Woo said is a lot of data that is sent at high bandwidth between the SOC and DRAM. If you break it down further, there are circuits on the SOC, the equivalent circuits on the DRAM, and the actual core of the DRAM where the bits are stored.
Even with HBM2, Woo said it’s still “pretty shocking” that two-thirds of the power is spent in moving the data back and forth. “About one-third is spent in actually storing and retrieving the data from the core of the DRAM.” What this means is there’s a lot of power being spent moving data that’s not helping with the computation. It’s generally understood that the stacking involved in HBM2 saves power when compared with other monolithic DRAMs, he said, and if you can stack processing close to where the data is, you can reduce the data movement, and hence dramatically reduce the power. The goal becomes finding some way to take the processing that happens on the SOC and somehow add it to the stack where the DRAM is.
HBM stacks and the DRAM that comprise them are different than the layers found in 3D NAND. HBM has DRAM layers packed full of bit cells connected with TSVs with an extra dye at the bottom called the base layer. “The job of the base layer is really to provide that electrical connectivity to the outside world, so it acts a lot like a buffer,” said Woo. This buffer is now being considered as a place to put some processing, but it raises a lot of issues, including thermal management. “The challenge is really stacking everything together,” he said. “Where do you put the processor and how do you cool everything? The processor’s going to likely get hotter, and the DRAM layers don’t really like to be hot because it affects the retention time.”
There are also the microbumps that act as standoffs between the layers, but there are areas that have no bumps, which means some sort of fill is required. “You have to kind of fill it with something,” said Woo. It improves the stability of the device because if it was an air gap, it could have a negative effect on thermal management.
NAND is grappling with its own challenges. as more layers are added to achieve denser flash memories. As 3D NAND is stacked higher, it has exposed the limitations of current filling methods. Traditional methods, such as chemical vapor deposition, diffusion/furnace, and spin-on processes have hit their limits due to trade-offs between quality, shrinkage, and gapfill voids, which companies such as Lam Research are solving with new deposition technologies to keep up with the changing fill requirements.