Akash Posted on May 31 Making RNNs Actually Work: LSTMs, Bidirectionality, and the Encoder-Decoder # machinelearning # deeplearning # nlp # ai Natural Language Processing and Text Mining (7 Part Series) 1 Before LLMs Could Predict, They Had to Count 2 Perplexity, Smoothing, and What Words Mean ... 3 more parts... 3 From Counting Words to Learning Meaning 4 From Perceptrons to Predicting the Next Word 5 What Does an LLM Actually Do? 6 Recurrent Neural Networks 7 Making RNNs Actually Work: LSTMs, Bidirectionality, and the Encoder-Decoder Stacking, Bidirectionality, the Encoder-Decoder, and LSTMs Last post ended with a simple RNN and three promises: LSTMs, bidirectional RNNs, and attention. This post delivers the first two, plus the refinements that turn a working-on-paper RNN into something you'd actually deploy. By the end, you'll know how to stack RNNs for depth, why reading a sentence backward as well as forward (bidirectionality) makes representations sharper, how the encoder-decoder turns one sequence into a different one for machine translation, and exactly what breaks in a simple RNN that LSTMs and GRUs were invented to fix. Attention, the fix for the last problem we'll hit, gets its own home in the transformer post, so we'll stop right at the edge of it. The simple RNN was the idea. This post is the engineering. A vanilla RNN carries a thread of hidden state through time, but in practice, that thread frays on long sequences, only sees the past, and bottlenecks everything through one final vector. Each section here is a fix for one of those problems. Put them together, and the path from "RNN" to "transformer" looks less like a leap and more like a series of obvious next steps. Stacking RNNs for Depth The first refinement is the easy one. Nothing says an RNN's output has to go straight to a prediction. You can feed the entire output sequence of one RNN as the input sequence to another. Then another. These are stacked RNNs (also called deep RNNs), and they usuall
Back to Home

Making RNNs Actually Work: LSTMs, Bidirectionality, and the Encoder-Decoder
B
Blizine Admin
·2 min read·0 views
📰Dev.to — dev.to
B
Blizine Admin
View Profile Staff Writer