[Lesson 5] imdb dataset shuffled

I was going through the lesson 5 notebook. I saw that it says that “We download the reviews using code copied from keras.datasets:”. I decided to checked the docs out at https://faroit.github.io/keras-docs/1.1.1/datasets/#imdb-movie-reviews-sentiment-classification and found that the code in the docs is actually different. I decided to use the code in the docs and when I tried to check what the sentence was I got this

“the of bernadette mon they halfway of identity went plot actors watch of share was well these can this only coe ten so failing feels only novak killer theo of bill br gretal would find of films saw grade about hated it for br so ten remain by in of songs are of sahib gigantic is morality it’s her or know would care i i br screen that obvious plot actors new would with paris not have attempt lead or of too would local that of every their it coming this eleven of information to concocts br singers movie was anxious that film is under by left this troble is entertainment ok this in own be house of sticks worker in bound my i i obviously sake things just as lost lot br comes never like thing start of obviously comes indeed coming want no bad than history from lost comes accidentally young to movie bad facts dream from reason these honor movie elizabeth it’s movie so fi implanted enough to computer duo film paraphrasing almost jeffrey rarely obviously snag alive to appears i i only human it gildersleeve just only hop to be hop new made comes evidence blues high in want to other blues of their for concludes those i’m 1995 that wider obviously message obviously obviously for submarine of bikinis brother br singers make climbs lit woody’s this estimated of blood br andy worst cavil it boyish this across as it when lines that make excellent scenery that there is julia fantasy to repressed notoriety film good br of loose incorporates basic have into your whatever i i gildersleeve invade demented be hop this standards cole new be home all seek film wives lot br made critters in at this of search how concept in thirty some this pliers not all it rachel are of boys war’s re is incorporates animals deserve i i worst more it is renting concerned message made all critters in does of nor of nor side be nykvist center obviously know end computer here to all tries in does of nor side of home br be indeed i i all it officer in could is performance buffoon fully in of shrimp br by br sniveling its tatsuhito lit well of nor at coming it’s it that an this obviously i i this as their has obviously bad dunno exist countless conquers mixed of attackers br work to of run up meteorite attackers br dear nor this early her bad having tortured film invade movie all care of their br be right acting i i dictator’s of tatsuhito mormons it away of its shooting criteria to suffering version you br singers your way just invade was can’t compared condition film of camerawork br united obviously are up obviously not other just invade was segel as true was least of hiyao certainly lady poorly of setting produced haim br refuse to make just have 2 which indefinitely of resigned dialog stuntmen br of frye say in can is you for it wasn’t in singers as by it away plenty what have reason zones are that willing that’s have 2 which sister thee of important br halfway to of took work 20 br similar more he good flower for hit at coming not see reputation”

This does not make any sense. So I went and used Jeremy’s code and got the same thing that Jeremy shows in the video. I think the difference is due to the shuffle parameter. But is that it or am I doing something wrong? I looked to see whether there is an option to not shuffle it and found that there was no way to do that. Is there? Why would they shuffle sentences? Shuffling the rows makes sense but shuffling the order of the words does not.

Hey Aseem,

Bumped into this issue too a while ago, here’s a very good explanation and solution(s):

Hope it helps :slight_smile: