Like many conferences over the past year, the 2021 International Memory Workshop (IMW2021) was virtual, with a wide array of presentations and papers on memory spanning NAND, DRAM, ReRAM and FRAM.
Not surprisingly, meeting memory requirements of artificial intelligence (AI) and machine learning was strong theme throughout the keynotes and tutorials. New paradigms such as in-memory computing, 3D memories beyond NAND flash, and advancing ReRAM for brain-inspired neuromorphic computing were just a small sample of what was covered by presenters from around the world.
In-memory computing shows promise
One of the benefits of in-memory computing is the ability to cache countless amounts of data constantly while ensuring rapid response times for searches, but there’s more than one way to do it.
In her tutorial, Catherine Graves, principal research scientist at Hewlett Packard Labs, outlined how analog digital content addressable memory (CAMs) could be used for a wide array of applications in security, genomics, and decisions trees. “If you’re not familiar with a CAM, it’s functionally kind of the reverse of a random-access memory. You give your memory some kind of address, you send that address to your RAM block, and you read out the data, stored within that address location.” The output from the CAM is going be the location of a match if one exists between the search data and the stored data, she said. “These content-addressable memories are giving you a high throughput look up operation.”
Analog and digital CAMs can be implemented for applications in security, genomics, and decision trees, Graves said, and map a diverse range of computing models. Her group at HPE has been exploring analog content-addressable memory based on memristor-based crossbar devices and using them for computation. “We can use these non-volatile analog tuneable devices to program a matrix of values within a crossbar.” A voltage vector is applied along the rows and the currents collected across the columns to perform vector matrix multiplications, she said. “This has wide applications in machine learning and fully connected neural networks, an area obviously of intense study, but also in scientific computing.”
CAMs are already used in a lot of commercial systems that require a high throughput compare operation, such as networking when doing lookups for IP routing to manage quality of service, noted Graves. “They’re incredibly high-performance structures.” But they’re also closer to how human memories work in that they receive inputs such just as we receive a smell or a sound that opens access to our brain and our system of remembering something.
Hyperdimensional computing, as explained by Manuel Le Gallo, research staff member with IBM Research’s in-memory computing group, is also “brain inspired.” It uses in-memory hardware to overcome some of the challenges with current deep learning computing. It has already been significantly enabled by semiconductor technologies, but still isn’t as efficient as the human brain, and it’s not energy efficient when doing training, he said. “Essentially the energy that is spent just to train the model is equivalent to two weeks of home energy consumption.” This makes it particularly prohibitive in an Internet of Things (IoT) and edge computing devices.
Another key challenge for deep learning is device complexity—most AI models today run on big servers, said Le Gallo, and they have a whole host of reliability, networking, security and control requirements and yet still don’t perform as well as the human brain. On way to address complexity is to use something else than deep learning and adopt a simpler model of operations during the training and inference phase of machine learning—what he calls hyperdimensional computing. To address energy efficiency, it uses in-memory computing hardware rather than the traditional Von Neumann Architecture. “
“The whole idea of hyperdimensional computing is to use hyper dimensional vectors to represent data.” Rather than represent data with a bit stream of 32 or 64 in a certain order, hyperdimensional vectors are a random bit stream of zeroes in ones and zeros in the 10,000s instead. “It’s a general and scalable model of computing,” he said, and the vectors can learn very fast. It’s also extremely memory-centric, which makes it also very amenable to being implemented with in-memory computing hardware. “It’s a purely statistical framework, so it’s just driving on randomness.”
Hyperdimensional computing combines in-memory storage and an encoding architecture, and the data is encoded on phase-change memory device, with the ones and zeroes using high resistance and low resistance states, respectively. The full system implemented by Le Gallo’s research group using a PCM produced accurate data and suggests in-memory hyperdimensional computing would be six times more energy efficient than a conventional equivalent digital CMOS.
Although there is room for improvement, Le Gallo said, the research demonstrated it was possible to design a complete “brain inspired” in-memory hyperdimensional computing system that can achieve near software equivalent accuracy when implemented on the PCM chip that’s energy efficient, fast, robust and transparent using imperfect nanoscale devices.
Emulating how the brain works is also the goal of neuromorphic computing, another area of discussion at IMW2021, including how 3D memories may help to realize it.
3D isn’t just for NAND flash
There’s been a lot of discuss of late about how memories other than NAND flash could go three-dimensional, as well as how memory devices could best me stacked to deliver the speed, performance and thermal properties to meet the demands of high-performance workloads such as artificial intelligence and machine learning.
Already, it looks as though the future of DRAM may be 3D, but it’s not unclear as to how it will happen. One aspect explored by Xi’an UniICSemiconductors Co. Ltd. CEO Qiwei Ren is the use of 3D integration technology for embedded DRAM that overcomes what he calls the “memory wall” that is a result of the Von Neumann architecture. The performance gap between the CPU and memory is getting increasingly bigger, he said, and more time is being spend transferring data than processing data, and more power is being used for transfers than actual computing.
Current memory system integration approaches include PCBs, SiPS using wire bonding, TSVs and interposers in HBMs, and embedded. Each have their own performance challenges in terms of bandwidth, latency, power, and reliability, said Ren. Form factors and system cost are also important considerations.
His work on stacked embedded DRAM (SeDRAM) employs a 3D hybrid bonding process, with logic being placed on top of the SeDRAM. Some of the advantages of this approach include a “via” level interconnection in a single chip from an SoC to a DRAM, a flexible logic-to-memory interface, and vertical interconnection. There’s no need for highspeed data bus wires and no extra big PHY in the SoC and DRAM are required, said Ren. The power consumption necessary to transfer data is reduced compared with other memory integrations. “The system design is much simpler, and the costs are much lower.”
Some of the applications for SeDRAM include a last level cache that combines a CPU logic wafer and SEDRAM wafer together, making it a low-cost solution for very high-density cache, or as part of a “sandwich solution” for a system in one chip with the upper layer for computing, a middle layer for data cache and a lower layer for data storage. “This would be a very promising solution for power-sensitive applications such as IoT,” said Ren. “We can achieve low power consumption levels and the meantime, very small form factors.”
Resistive memories may also benefit from 3D to increase density, particularly to meet the requirements of AI, which is also quite energy intensive, said Elisa Vianello, senior scientist and embedded AI program director at CEA- Leti. It also places significant demands on storage and computation. “Today’s computing architectures are inefficient in rendering AI tasks. The energy costs of moving data between the process and the memory can reach 90% of the total energy consumption.” She said new chip architectures are required to address the power and area constraints in many AI application areas such as edge, embedded and IoT.
“Memory is at the center of the energy challenge,” said Vianello, in large part because so much data is required in the training phase. “A single AI model stores millions of parameters.” Mix in the high-power costs associated with moving the data between the memory and the processor, and it’s clear there’s a need for high density on-chip memories, she said. “Resistive memories can play a crucial role.”
The appeal of resistive memories is that they are fast, non-volatile memories that can be embedded at the core of the CMOS and come in different variants, including phase change (PCRAM), magnetic (MRAM), and ferroelectric (FRAM). Vianello presented two strategies for resistive memory architectures—a multiple bits-per-cell approach using ReRAM and 3D integration. In the first scenario, programming multiple calls can be challenging, she said, and requires a smart, iterative programming strategy. The second combines the first with 3D sequential integration.
Also known as the 3D monolithic, 3D VLSI or CoolCube, the two-tier 3D sequential integration process flow starts with a bottom MOSFET with or without interconnects, a top active layer creation with hydrophilic bonding and thermal anneal followed by a top MOSFET process. Tier 2 is a 3D contact BEOL. “It is different than 3D packaging where the tiers are fabricated in parallel on two different wafers followed by a stacking or bonding step.”
Vianello said the monolithic approach is only limited by lithographic capabilities rather than bond alignment which allows much improved 3D connectivity. This base wafer layer is followed by ReRAM integration above to make what is called at 1T1R architecture that benefits from the high density of vertically stacked NS transistors, which she said features excellent scalability and exhibits the same electrical behavior of the planar one.
Moving forward, the research involves building a 3D “My-Cube” that combines gate-all-around (GAA) nanowires, vertical resistive memories, and 3D monolithic integration, said Vianello, and the proposed architecture benefits from the highest density of vertically stacked NS transistors. Each horizontal GAA channel features an independent source connect to a BitLine and a drain directly connected a pilar of ReRAM memory cells.
Vianello said high density ReRAM devices achieved through multi-level programming and 3D integration, as well as crossbar architectures, will enable the design of new energy efficient AI hardware based on near and in-memory computing, including brain-inspired neuromorphic processors.
Presented by the IEEE and the Electronic Devices Society, the IMW2021’s entire four-day program is available online until June 20, including both pre-recorded material and live Q+As. Interested attendees can still register until June 15.
Gary Hilson is a general contributing editor with a focus on memory and flash technologies for EE Times.
The post IMW Highlights 3D Architectures, In-Memory Computing appeared first on EETimes.