Text Classification in Keras (Part 1) -  A Simple Reuters News Classifier

Text Classification in Keras (Part 1) – A Simple Reuters News Classifier

hey what's happening hunter Heidenreich here and today we're going to teach you to use real mail mom cares real fast we're going to take you to making your first natural language processing model and we're gonna do it on a topic classification test and by the end of this video you'll have your first model written and it'll be ready to roll so let's dive right into what we're gonna do today I'm writing in Jupiter notebook so you can follow along type with me and we'll start by importing carrots and we will import the data set that we're going to use okay Kara says data set a we are going to be using using the Reuters news data set which is question of news articles that are divided into 46 different topics and we're going to write a classifier very simple classifier to classify these topics already got a typo no worries this is the first time that you have run these two lines what Kerris will do is it will go ahead actually not this line but when we actually go to load in the data which we can you can do by writing out this line and we'll write out route of route load data and when we run this line if you've never run this before what it'll go ahead and do is it will download this data set to your machine now we want to give a couple parameters to this function the first one is the number of words now what we're going to do is we're going to set this to now what this means is we are not putting a cap on the most frequently used words if we set this parameter to a number it will cap the words that are used in the data set to to the most frequently used words so for example if I put in 100 here it will only load the hundred most frequently used worked and set the rest of them to the same bap now we don't want that we want all the so we're gonna set this to none and then the other value that we want to give this function is the test link so this is for all the articles that we have how are we dividing them up into training data and test it and we're going to use 20% of the day that passed test the other thing that we want to load from the data set is the word English now you'll see that we don't get this data as a list of words we get it as a list of indexes so we can actually use this function to get the original word index back and that will be important for inspecting the data so when we run this you Oh Tesla when we run this if you have never done this before this is one that will download the data now let's go ahead and inspect some things about the data the one let's check out the number of training samples you do by looking at the length extract likewise let's look at the number of test samples and let's see the number of classes now I told you it was 46 but let's see if I'm lying to you by calling the maximum of Y train and what this will do is it'll get the maximum value in here but because it starts at zero we've got to add one and we can license there we go and so we can see that we have almost 9,000 training samples about 2200 test samples and 46 different news classes so let's go ahead and take a look at what a training sample product X train and it's corresponding and we'll see like I said you get a list of numbers now this is what caris calls a sequence and what this corresponds to is 43 is the 43rd most frequently ten is the tenth most people 447 is the 440 seventh most frequent work now if we set num words here it would have capped this we didn't want to do that and so that's why we do it like 28,000 they eight hundred and twenty eight thousand eight hundred forty second most frequent work this is good this is what we want to see but let's say we wanted to actually see some of these words so that's what the word indexes for so if we look at word index given a word it will give back to us it's in so we could say something like money and we can see that's the 236 most frequent word but what if we want to go the other way around what if we want to go from index to work well to do that we're actually going to need to reverse this dictionary so we'll say indexed word and for the key value pairs in word index dot items will iterate over this and we'll say index word at value to be equal to keep and so this will flip the dictionary around and so now if we were to say index to word and say 236 we'd get back money so let's go ahead and look at word 3 we get lost so there we go now we can actually come back convert our training sample and we could do a list join and take our list in the form for X in X train at 0 so this will give us all the indexes our next rain and let's feed them into index toward both so there we go and we can see that some of these words are shortened and a little bit funky and the reason why is because you can see this data set has been pre processed and some of it doesn't affect us for what we're gonna do with the classification but it's nice to see and let's say we want to look the seventh training sample we do that so we can look at these different training samples and we could see kind of the original data that was them but let's go ahead and move forward and let's prepare our data for actually training the model we're going to do that we're going to use something from Charis from the pre-processing library called the coconut now before we do that we're going to actually set a maximum number words we'll just set it to 10,000 and we're going to build out our tokenizer giving it a non words argument of our max words and then we're going to convert our training enough test data to the format so X train we're going to convert from a sequence to a matrix okay and we'll give it extra and the mode that we're going to use is by neck what this does is it basically says in a given sequence is the tenth most frequent word there if yes set that index to one if now set it to zero and so it's just a one hot of is a word in the sample burner and we'll actually take this again and do the same for X test um man what we'll do to our actual labels is we'll take them from integers and turn them into one hot encoded vectors so we'll use a Karass utils function called two categorical and we'll give it our training labels and we'll actually have to give it the number of classes that's why and we will just do the same for Y test as well and we can go ahead and run this beautiful and now when we go ahead and look at X train let's go ahead and we'll inspect the shape and we'll inspect one of the samples I'll do the same thing so we can see that we have two thousand training samples both in the labels and the actual data and see that they're now of length 10,000 captive alerts and this is at forty six like before and we see that now that three has turned into a one in the third index we can see that our sequence has turned into a 10,000 dimensional vector with ones where those words are and zeroes when the words are not present perfect so now that our data is ready to roll we can go ahead and build out our first simple month so we'll import from Charis that models the simple sequential model which is a model that will have a linear sequence of layers that we will add that we will type our data we wanted something more complex we can build that but we don't need it we're staying simple we're just trying to get our our feet wet so let's go ahead and then put some layers we'd like to use then let's use a dense layer which is a densely connected layer a fully connected layer we'll have a sequence of notes we'll feed our input into that sequence of nodes do some multiplications and that will be how will use dense we'll use drop out as a sort of regularization to not over fit our data the training data and then we will want to use some activation functions for some nonlinear activations so let's go ahead and build out our model instantiated boom simple enough and let's dive in and start adding some layers the first layer we'll add is a dense layer I'll pull the connected layer and we'll make it 512 dimensional and we'll give it an input shape since it is our first layer and we'll give it max words and then we'll do a comma and leave this blend so that we can basically put a variable amount of examples in anytime that'll be our batch size okay now let's go ahead and add a nonlinear activation we're going to use really and you don't know what this is you can look it up but it basically sets negative outputs to zero and everything else to the identity function dude we want to do and then we will add some drop out during training we will forget 50% of the previous output when calculating what will become our prediction line which will be a dense layer with the number of classes and then we'll just add an activation with your softmax which basically what it does is it sums together the outputs of this final layer and then divides each output by the sum of the whole basically turning it into a probability calculation of how likely is this sample 1:1 of this category and so we'll take the maximum let's say it's the third then we classified it as the third category and if it is training to level zero then we just got it right so there we go that will construct our mountain now what we need to do is we need to actually compile our model this what we do in become compliation step is we define a loss function we define our optimizer and we define any sort of metrics that we're going to use so for a loss we're just going to use the categorical cross entropy loss categorical because we are categorizing and cross entropy because it's just the simple cross entropy loss function that allows us to quickly back propagate and we can use optimizer we'll just use the default parameters for the atom optimizer and then for our metrics we're only going to care about accuracy and once we compile our model we can actually take a look at the metrics that our model has been compiled with to see what it's going to output force when it evaluates miss about optimizing so it's going to output a loss and in accuracy perfect let's go ahead and actually train our model now we're ready to drink so we'll define the batch size we'll set that 32 and let's train it for two epics just see what happens it's actually train our model we'll take the output and we'll set it into history we can call model that fit now this is the function that will fit our model to the training day make sense we can give it our X train and Electra and then we'll want to also give it our batch size set for the variable and then the number of ethics that we're going to train for to be this parameter of verbose to kind of give us an output but a reduced output and then we want to take a validation split and we'll take 10% of our training data and use it to validate that our model please perform then we want to score our model by evaluating it on test data again we'll give it the batch size and we'll give it the parameter of both which we'll set to 1 and then we will print out so that'll be score and accuracy and that'll be scored at one and so we can run this block of code and we can see our model has begun to Train let's hear our accuracy going up our loss going up and we can see an ETA of when is this epoch gonna finish it also see that we're using 10% of our data to validate and there we go so final accuracy training actress of 71 validation accuracy was 79 now we get one more epic to run for our training actresses going up and we can see our final training accuracy was eighty eight point five nine and validation was 81 point seven six and then our actual test accuracy was any point four so that's pretty cool we just trained a model that's on 2246 test samples it's classified this many correctly so let's see how many that is it's test and so that means we classified about eighteen hundred of them correctly so and that's a very simple model simple symbol training suite scheme there's a lot more that we could iterate on and travel in group and then we will try to improve in future videos so that's all I have for today there was a lot of content if anything didn't make sense yeah clarification you want to ask questions about anything you did today please leave me a comment I'm more than happy to have a discussion point you in the right direction see how we can get you on track and I hope that you'll stick around maybe check out in the next video but if you liked this video I'm actually running a GoFundMe campaign to help fund my machine learning research and tutorials them I'm working on trying to get a better machine so if this video is helpful to you in any way and you want to check out my GoFundMe maybe read me a dollar – that would be awesome but if not I hope you stick around and check out some more of my videos


15 thoughts on “Text Classification in Keras (Part 1) – A Simple Reuters News Classifier”

  • Manuel Rios Beltran says:

    This is a very inbalanced Data Set isnt it ?
    Class 3 has 3159 samples whereas there are classes with 30 samples or less.
    I think that we should use another metric to measure our model, metrics like Precision and Recall would be usefull in this kind of scenarios.
    Finally, I think that it would be nice to attack imbalanced data set using the class weights parameter on the .fit method
    Great video BTW

  • Daniel Weikert says:

    Hunter can you elaborate on why you have to use the sequences_to_matrix? Padding was already done on the dataset I guess?

  • yashvardhan nevatia says:

    Hey, where can I get the class lables from ?
    When I try to classify a new input, model.predict_classes(input) gives me 4. But I dont know what that means.

  • Do you have the code posted online?
    Also, is this truly multilabel classification? I noticed that the only example that you printed had only 1 label in it one-hot encoded form.

  • Can this model learn word n-grams from the list of word indexes? Our should one take care of n-grams tokenisation first, and the one-hot encode them?

  • Thanks, your video helped me a lot for the job interview assignment I had! I wonder – you use both split_validation and a test set x_test, y_test. Isn't this a redundancy? From what I understand they just DISPLAY to you how good the performance of the neural network is. Or are there some additional features that I am missing, like the neural network is improving automatically based on validation_split.

Leave a Reply

Your email address will not be published. Required fields are marked *