Google I/O 2012 – Breaking the JavaScript Speed Limit with V8

Google I/O 2012 – Breaking the JavaScript Speed Limit with V8


Hello. Welcome to Breaking the
JavaScript Speed Barrier with V8. My name is Daniel Clifford. I’m the tech lead and manager
of the V8 team. If you’d like to follow along in
the slides, there’s a link here on the first slide. It might help for you in the
back rows to read the smaller fonts in some of the
code slides later. I’d like to talk to you today
about optimizing the performance of your
web application. Now this could mean a lot of
different things to a lot of different people. This could mean optimizing your
WebGL code to get more frames per second in the game. This could mean reducing the
number of network round trips in a client server
application. But what I’d like to talk to
you about today is Raw JavaScript Execution Performance
with V8, the JavaScript engine inside
of Chrome. The V8 project has recently
moved to the development center, the Google Development
Center in Munich, Germany. And as you may know, Germany is
the home of the Autobahn. And what you may not know is the
Autobahn is not actually a single stretch of road from
Hamburg to Munich that runs straight through the middle
of the country. But rather it’s a network
of roads. And these days, most
of the Autobahn actually has a speed limit. But there are still a few
stretches of the road that don’t have any speed
limit whatsoever. And you can go as fast as you
want, as fast as you can. And so, as a resident of Munich
and a resident of Germany, I feel especially
qualified to speak about moving at speeds that would get
you ticketed here in the United States. You might ask the question, who
cares about performance? JavaScript performance
is fast enough. This is something that only game
developers need to worry about, or the developers
of highly interactive applications. But I’m here to tell you that
you should care about performance. Everybody who develops a web
application should care about performance. And the reason is, is that JavaScript performance matters. It still matters. It’s not just about making
your current application run faster. It’s about enabling things that
you have never been able to do in the past. Every cycle that you win back
when you do performance optimization, you can invest
in something new that you haven’t been able
to do before. An interesting comment about
these pictures, they were taken on the same street,
only a few feet away. I think it’s interesting that
in the same place, given a slightly different perspective,
you can have a completely different
view of the world. So let’s see. Let’s talk about breaking
the speed limit today. I’d like to use a checklist as
the basis of how you optimize your web application. It’s a relatively
straightforward model. But it’s very powerful. And the first step of the
checklist is to make sure you are prepared before you
have a problem. This is the piece that is
the most expensive. This is the one that takes
the most amount of time. And I’ll be spending the
most amount of time talking about it today. But this is the one that you
have to do to make the second two steps possible at all. The second step is to identify
and understand the crux of your problem. And if you know how
a system works. If you prepare before you go
into the second step, it makes it much easier to find
your problem. The third step, you
fix what matters. And hopefully this is the step
that’s actually really easy after you’ve done
the first two. Now, this is a relatively
simple model. But, like I said, it’s
very powerful. And you can even apply it to
real world situations. So I’d like to tell you a story about using the checklist. And I like to do recreational
mountain climbing. I’m not very good at it. But I have a friend
who is very good at it, Michael Stanton. And he’s taught me
everything I know about mountain climbing. And I trust him on
the mountain. I trust him with my life. And that’s how we found
ourselves in the summer of 2010 up here. And it was a beautiful day. We’d been on the mountain
all day. We hadn’t seen another soul. We had the whole mountain
to ourselves. And we were two rope lengths
from the top, from the summit. We were almost there. And I was belaying Michael. I was making sure that if
he were to fall that I would catch him. And he was leading ahead. And he climbed ahead of me
around a corner, out of sight. And I heard him struggling with
a particularly tricky climbing problem. Then I heard two sounds that I
will never forget in my life. And the first one was the
percussive sound of a rock coming loose. And the second sound was the
sickening thud of that rock hitting Michael. He fell. I called his name. But all I heard was silence. I realized in that moment that
I had a performance problem. And the checklist that I just
presented to you would be the key for me to get out
of that situation. I had to be prepared before I
ever got on that mountain. I had to make sure to identify
the problem that I had in that very moment, the key
problem, so that I could fix it quickly. Somebody’s life could
depend on it. Now, I hope you never are in a
situation like that, that your life depends on applying
the checklist. So why don’t we go to an
example, which talks about a less life threatening
performance problem. Let’s talk about JavaScript. So I have put together a sample
problem that I’d like to talk about throughout the
course of this talk. And it is a toy problem that
I’ve come up with. But I think it’s representative
of some of the things that– some of the performance problems
that you might face in your own code. And the problem is to compute
the 25,000th prime. Pretty simple algorithm here,
you start at one. You count up. And you take each candidate
number. And you look in a list of primes
that you’re maintaining and see if any of those primes
can be divided into that candidate number. And if it can, then
the candidate number is not a prime. If none of the prime numbers go
into that candidate number, then we have a new prime
and we add it to list. And I wrote some code in
C++ and in JavaScript. Don’t sweat the details
of this code. I just wanted to show you
that it’s about the same amount of code. And I tried to do a pretty good
job of porting the code from JavaScript to C++. So it does the same thing. And then what I did is I
actually ran the code. I compiled the C++ code
with GCC and ran it. And you can see it took about
three seconds to compute the 25,000th prime. Then I took the JavaScript
code. And I didn’t run
it in browser. What I did was I took the
debugging shell that’s available as part of the V8
open source code project. And I compiled that and ran
the JavaScript with the debugging shell. The debugging shell makes
it possible for you to concentrate on JavaScript
specific problems. So for this presentation, I’ll
refer to the debugging shell in most of my examples. You’ll see that when I ran the
prime code in JavaScript, it took 15 seconds. So JavaScript’s about five
times slower than C++. That’s not so bad, right? Of course it is, yeah. So I just made a big mistake. I’ve told you about
this checklist. And I just completely
ignored it. I was not prepared. I don’t have enough information
to make judgment about whether my code is bad or
good, whether that number is something I should
expect or not. I didn’t even try
to analyze any potential performance problem. And I didn’t even get anywhere
close to fixing any problem. I’d like to show you
how V8 has improved since Chrome came out. This is V8’s score on the V8
benchmark, version seven. And you can see that it’s
got a lot better since we launched Chrome. And we’re now at the position
with V8 that we can actually make a comparison against the
performance of C and C++ code. You can expect to be in the
ballpark for certain code. So don’t give up here. When you’re in the same
situation, where you say, we’re slower than C++. Or I’m slower where
I want to be. Don’t just give up. Apply the checklist that
I presented to you. And see how far you can get. Demand that your application
is faster. So how much faster do you
think we can get the JavaScript code to run? Who here thinks we can do it– we can get it three and
a half times faster? All right, we’ve got
maybe 5, 10 people. 35 times? Takers? OK, also a couple of people,
I know you know how this game works. 350 times? OK, one person, the V8 PMs. [LAUGHTER] 3,500? All right, we’ve got to get at
least one brave taker here. So I think the most votes
were for 35 times. Let’s apply the checklist and
see how far we can get. Now I told you the first step
was to be prepared. And this means we’re going to
dive into some technical details of V8, so you can
understand how it works. So fasten your seat belts. This is where things get
a little bit racy. So what does prepared
mean for V8? I mentioned this before. You need to understand
how V8 optimizes your JavaScript code. You need to understand how V8
actually works so that you can help it optimize your code. Because once you understand how
V8 works, then when you actually write your JavaScript
code, you can keep those things in mind and make sure
that you don’t make some of the mistakes that
will fool V8. Make sure you learn the tools
that are available to you. There’s some great tools in
Chrome, the dev tools. And V8 has some built-in tools
in the debugging shell that can really help you understand
what’s going on in your JavaScript and help you
make it better. All right. So the first thing I’d
like to talk to you about is hidden classes. So JavaScript doesn’t
have explicit types. And this presents a bit of
a challenge to a language implementer, because type
information is very valuable in specializing generated code
to be very efficient. And if you don’t have that type
information, it makes it much more difficult. And it’s hard to gather this
information efficiently and quickly when you’re compiling
JavaScript code because compilation is part of
the execution time of a JavaScript program. Every cycle that you spend
compiling a JavaScript script before you execute it is a cycle
longer than it takes to actually get to doing
the work. So we have to find a way
to get executing code as fast as possible. And it makes it difficult to do
expensive reasoning about types at compilation time. And that makes it extremely
difficult to get it close to the performance of C++. However, there are
some tricks. Hidden classes help make V8
run JavaScript faster. V8 internally creates hidden
classes for objects at run time, not at compile time. And that’s how we get around
the problem of compiling upfront, getting the information
upfront, because we gather the hidden class
information as we go. And here’s a key insight. And this is one that we’ll
repeat throughout the presentation. Objects with the same hidden
class can use the same optimized generated code. And that means if you help teach
V8 through the way that you structure your code, which
object has the same hidden class, you can help it to
generate optimized code for those cases. I’d like to show you
how this works in a very simple example. So I’ve got a constructor
function here for an object point. And I’m going to instantiate
it twice. And let’s see actually what
V8 does behind the scenes. So the first time the
constructor gets called, V8 creates a new hidden class. So what you see here is– on the
right hand side– is you see the new point object. The first entry in the point
object is a hidden class to an empty point hidden class. And as we assign to the members,
in point, what V8 does is actually realizes, ah,
you’re changing the hidden class by adding a
member to it. And we create a new
hidden class. And the point object
gets assigned to this new hidden class. And if we had another member– in this case y– we get yet another
hidden class. And you see what happens here
is that not only does V8 create new hidden classes as
new members are added. But it keeps track of what had
to happen to get from one hidden class to another one. So you see we went from empty
point object to a point object with an x, to a point object
with an x and a y. And that’s important because the
second time that we call the point constructor, this
hidden class structure is already there. And as the code is executed to
add the members x and the members y, you’ll see
that the object also gets new hidden classes. But it gets the same hidden
classes that we created the first time through. So that at the end, we actually
have P1 and P2 having the same hidden class. The problem is, is that if we
add a member to P2 that we haven’t added to P1– after the fact, outside
of the constructor– what happens is that we
create yet another hidden class for P2. And it is different from
the class for P1. So they are now– we’re not able
to use the same optimize code for both of
those objects. So be aware that by doing
something like this and creating hidden classes, you’re
defeating some of the optimizations that V8 can do. I’d like to talk about speed
traps to avoid in this presentation. For the most part, all of the
advice I’m going to give you today is something
that is eternal. You can assume that the tips
that I give you today you can use forever, except for a couple
of cases where I’ll say, this is something that’s
temporary, something that we’re working on and we’re
trying to make better. So let’s get to the first speed
trap, which is make sure to initialize all members in
your constructor functions. This comes from the example I
just showed you, which is you need to teach V8 which hidden
class objects should have after they’re constructed. And the way to do that is by
putting all of the member initialization in
the constructor. Always initialize object members
in the same order. This follows from that example
that I gave you. If you add members in different
orders, you create a different tree of
hidden classes. And at the end, you’ll have
objects with two different hidden classes that can’t use
the same optimized code. Next thing I’d like to talk
about are numbers. So because we don’t have a lot
of type information in JavaScript, we have
this dilemma. We want to be able to generate
code that can handle all possible different values, all
different possible types. But you want to be able
to have an efficient representation of numbers. And how do we do this? Well, we use a technique
called tagging. So inside of V8 we pass
around values of 32-bit numbers and objects. But we want to be able to
use the same 32 bits to represent both. And that way we can have one
code path that can handle, in many cases, the objects
and integers. So what we do is we use
the bottom bit. And each of the values have
a special meaning. If the bit is set, it’s
an object pointer. If it’s clear, it’s what we
call small integer or smi. And that’s a 31-bit
signed integer. Now if you have a numeric value
that you’re passing around, assigning to a member
that is bigger– it’s a numeric value that’s
bigger than 31 signed bits– then it doesn’t fit in
one of these smis. And we have to create what’s
called a box for it. We box the number. We turn it into a double. And we create a new object to
put that number inside of it. And what follows from that is
the next speed trap to avoid, which is make sure, whenever
possible, you use 31-bit signed numbers for performance
critical calculations. Because if you exceed 31 signed
bits, we will have to convert it to a double value. Now there’s some optimizations
that we have that doesn’t make it as bad as it seems. But there are cases where that
process causes an allocation. And that means that some
mathematical operations can cause allocation. The next thing I’d like to
talk about is arrays. This brings some of the elements
together of the previous two sections. In JavaScript, you can have
very large arrays. And you can have arrays that
are sparse, that don’t have every element inside of them. And so V8 actually has two
different methods to handle two different types of arrays. The first one is
fast elements. And fast elements, they have
a linear storage buffer. And V8 uses them if the
set of keys for the array is very compact. If that’s not the case– so you
have a very sparse array– then it uses a different
format, which is dictionary elements. Fast elements, linear buffer,
very fast and efficient to access. Dictionary elements,
it’s a hash table. Much more compact but
also more expensive at run time to access. Kind of obvious here, if you
want to have your arrays be very fast, make sure that
V8 uses fast elements to represent them. And the way to do this is to
make sure that your keys start at zero and that they’re
contiguous. So don’t create an array
and start using index number 25,000. Start at zero and go up. Also, don’t allocate really
large arrays that you’re not going to use everything in. So if you allocate an array with
65,000 elements and you only use the first five, V8
thinks, well, that’s a very sparse array, and will switch
into dictionary mode, which means the access of the elements
will be slower. Don’t delete elements
inside of arrays. Same principle here, because
when you delete elements, it makes the key set sparse. And it can lead to V8 switching
the elements to dictionary mode. Don’t load from uninitialized
or deleted elements. This is an important one. It’s easy to get wrong. Here’s an example. Your code will work. You won’t notice. And it won’t make a difference
in the output. However, it will make
things a lot slower. So here’s an example of
doing exactly this. I have a loop here that
accesses an array. And it uses the or equals
operator, which means it takes the existing value in the array,
ors it to something and then writes it back. But the first time that you do
that, you’re accessing an empty array. And when you do that, the
JavaScript spec says, that’s an undefined value. But that’s OK. It can be converted to zero. And it’s exactly what
it does here. It converts that zero to do the
or equals and then stores it back in. But that process of conversion
means that this operation has to do more checking. And it’s more expensive. And by simply initializing the
value before you use it– that’s the second
example here– you can make your code,
in this case, about twice as fast. So what about arrays
of doubles? Remember I talked about tagging,
that we have to wrap double numbers inside
of special objects. So if you have an array of them,
doesn’t that make it expensive because each element
has to have this wrapper object to store the
double value. Well, there’s a trick that we
can use there too as well. So we have a mechanism
called unboxing. When you unbox a double, you
take the 64-bit IEEE number inside of this double object. And you take it out. And you put it into
a linear buffer. So what we can do is track in
the hidden class of an array the types of elements that are
contained in the array. And if the array contains
only double values, it becomes unboxed. And what that means is that it
takes all of those numbers out of the heap objects, out of the
wrapper objects, and lays them out in a linear
buffer of doubles. But again, it’s only if there
are double numbers inside of the array. This causes the hidden
class change. Generally, that’s good because
if you convert the double numbers into a flat buffer– if you have array of doubles
and it’s in a flat representation, it’s very
efficient to access that. And when you store into it,
it doesn’t require any allocation. But you must be aware that
converting between the array that’s boxed and unboxed
has a cost. And it changes the
hidden class. And careless manipulation of
arrays can lead to a lot of extra work through this
boxing and unboxing. I’ll give you an example
of that. Here’s a pretty simple
snippet of code. I create a new array. And I sign a bunch
of values to the first couple of elements. Looks like there’s not a whole
lot of work to do here. But because I create an empty
array, the initial object that I have– you’ll see– doesn’t actually have
any elements at all because it’s empty. So the hidden class says,
hey, I’m an array. And I’m optimistically going
to assume it contains smi values and that I’m not going to
have any deletes, no holes. We can do that because
the arrays empty. So it’s actually a
true statement. It only has smis because
it has nothing in it. What happens when we assigned
the first element of the array, because we didn’t have
a backing store for the elements yet, we have
to allocate it. And by default, we allocate
a buffer that has room for four elements. Now that the backing store is
actually there, we can store the first element in here. And, again, this is stored as a
smi, as one of these compact 31-bit signed integers. And now that we have space, we
can also assign the second element, also not a problem. However, the next assignment
assigns a double value. And what this does is forces
V8 to convert the format of the backing store, because we
can’t store a double value directly into the array. But what we could do is we could
convert the existing integer values to doubles. And then, with the unboxed
version of this array, we could store the 0.5 value. And that’s exactly
what V8 does. It reallocates a new buffer. This time each of the elements
slots is 64 bits. It converts the existing small
integers into double values. And then it assigns the 0.5
to the third element. You’ll also notice that a hidden
class change happened here, because what we did
is added a double value. So it’s no longer true that
the array only has smis. We now have doubles in there. That means that the hidden
class has to be changed. It causes a hidden
class change. Something similar happens when
we try to assign true. So true, there are certain
objects in JavaScript which are kind of oddballs. We actually call
them oddballs. True, undefined, null, they
are not really objects. But they kind of act
like objects. But they’re certainly
not numbers. So when I assign true to the
third element or the fourth element here, it causes another
allocation because we can no longer represent this
new value in this unboxed double array. So we have to convert it back
into its tagged format, which means converting the elements
that can be represented as smis back into smis. It means reboxing the
double value. And now we have a backing store
that we can actually store the true value in. That’s a lot of work. This is a very short
piece of code. But we did four allocations,
two array, two hidden class changes for just a couple
lines of code. There is a better way. If you use an array literal,
this is a hint to V8 upfront what the values are going to
be inside of the array. And given this example here, we
have a single allocation. All of this can be done
in one allocation. The correct backing store format
can be selected at compile time. And we don’t have all of these
conversions in the initialization of the array. So touched on this already. Speed trap, make sure, wherever
possible, you initialize arrays with array
literals, especially for fixed size, small arrays. Also for small arrays, make
sure to preallocate them before you use them. Now, again, this is different
than large arrays. Large arrays, remember, more
than 65,000 elements, you want to make sure you grow as you go
so that you don’t go into dictionary mode. For small arrays, even if
they’re not completely filled, even if they’re small, if
they’re sparse, if they’re small, V8 will use fast
elements for them. But only if you allocate the
correct size will you be able to avoid allocations when you
go from an empty array and store the first element
into it. Don’t store non-numeric values
in numeric arrays. So if you have an array of
double values, V8 can generate very good code for manipulating
those values, for those double values. But if you put an object into
the array, we have to box everything again. And the code that we generate
is a lot less efficient. OK, we’ve talked about hidden
classes and how they are an important piece of
optimizing code. Let’s actually talk about how
we generate code in V8. And this will maybe help you
understand a little bit better why it’s important to make sure
that you understand the hidden classes. So a V8 has two compilers. The first one is called the full
compiler, because it can actually implement the full set
of JavaScript features. It generates good code
but not great code. There’s an optimizing compiler,
which I’ll talk about later, which produces much
better code for most of the language. The goal of the full compiler
is to generate code quickly. I talked about before the fact
that any time you spend compiling code is added to
your execution time. So the full compiler strategy
is to get executing code as quickly as possible. It generates code that
is just good enough. But because of that,
it basically doesn’t do any type analysis. It doesn’t know anything
about types. And the way that it solves the
problem of implementing operations on different types
efficiently is a technique called inline caching
or inline caches. And what inline caches do is
they gather type information and run time. And they optimize the
code as you go. So you pay as you go. And here’s the way inline
caches work. An inline cache, or an IC, is
a type-dependent small chunk of code that knows how to do
an operation given specific hidden class– inputs of a particular
hidden class. And what they do is they
validate the type assumptions. They get a couple
arguments in. They check to see if the hidden classes are as they expect. And if they are, then they know
that the code that is in the IC is the optimal code for
implementing that operation. If not, something
has to happen. A new IC has to be generated
that can handle the new type information. And at runtime, if different
types are seen than what are expected, then through
backpatching the code is changed to use new specialized
code for the new types that are seen. I’ll give you an example
of how that works. Here’s a fragment from the code
example at the beginning, a prime number generator. I’d like to take a particular
fragment of code there, which takes an element from the
prime’s array and checks whether the candidate number is
divisible by that number. And let’s see what the
full code generator does with this code. Don’t sweat the details. I’m going to show some assembly
language here. And it’s not actually
important what the details are here. But I want to show the pattern
of what actually goes on here. So each expression generates a
little chunk of code here. You’ll see that there’s some
preparation code and then there’s a call out, a call to an
IC to actually do the work. Likewise, for the array element
access, there is some preparation work and then a call
out to the IC to do the work, and for the [INAUDIBLE]
the same thing. And the thing to notice here is
that, initially, the call is to this initialize version
of the IC, because we don’t know what the types
are going to be. We’ll only know that
at run time. And this is how it works
at run time. The code on the left here is a
small, almost impossible to read version of the code
I just showed you. And when we’re executing this
code for the first time, and we get to this IC, the first
IC, it calls an internal routine that initializes
the IC. It looks at what the arguments
are to that operation the first time. It checks the hidden classes. And it says, oh, I know
how to do that. I can generate specialized
code for that. It does so. And it backpatches the address
so that the next time you come to the code, it does the
same thing again. Assuming that the next time you
run, you’re going to have the same hidden classes, the
same types of objects, which is an optimistic but usually
valid assumption. It does the same thing for the
access of the array element and for the [? lot ?]
operation. And you’ll see that each of
the ICs, all they are is specialized code that know how
to do the specific operation for a given set of object
types that are input. This makes a big difference
in performance. And this is why you need to
understand how some of this stuff works with hidden classes,
because if you get it wrong, you don’t actually get
the performance improvement that you will see or should see
when using ICs and then, later, the optimizing
compiler. So if you’re running the sample
code that I showed at the beginning to generate primes
without ICs, you get an execution time that’s
over 600 seconds. When you turn ICs on– don’t have the optimizing
compiler on yet– it makes a big difference. So this is about a 20 times
speed improvement. Now I’d like to introduce a
concept that we’ll also come back to later. Monomorphic better
than polymorphic. It sounds pretty complicated. It isn’t really. Monomorphic operations are
operations that always work on objects with the same
hidden classes. So the first time you execute an
IC, we look at the argument types for the operation. We look at the hidden classes. And we remember them. And if the next time, through
those hidden classes, are the same, it’s monomorphic. And if the next time they’re
the same, it’s still monomorphic. And it stays that way. If at any time we see different
hidden classes for the arguments to an operation
then it becomes polymorphic. And it turns out that
monomorphic operations are easier to optimize. Or the code that we generate is
better than for polymorphic operations. So here’s an example. Let’s say we have a function
add that adds two things. And if I call that with two
integer arguments, it’s monomorphic. However, if I call it again with
two string arguments, I use the same operation–
this plus operation– to work both on integers
and on strings. And at that moment, it
becomes polymorphic. And it becomes more difficult
for V8 to generate good, really efficient code for
that plus operation. So the speed trap to avoid here
is, wherever possible, don’t try to mix operations or
objects of different types in operations at a particular
place in your code. Prefer monomorphic code
to polymorphic code wherever you can. I’d like to talk about the
optimizing compiler. We talked about the full
compiler, generates pretty good code, gets executing
as fast as possible. The optimizing compiler comes
back later for the code that’s really hot and recompiles
the hot functions. Why does it do this? Well, we’ve gathered information
about the types of the objects that we’re operating
on through the ICs. And the optimizing compiler can
take that information and make decisions about how to
optimize the code better. What happens is the optimizing
compiler takes that type information from the ICs. And it speculatively inlines a
version of the operation based on history, what it’s seen
before in the ICs. And it also means that it can
inline function calls that are monomorphic. This includes constructors. This is actually a new feature
that we’ve added recently. By enabling inlining, by
inlining this operation in the optimize code, other
optimizations become possible. And that’s what the optimizing
compiler will actually do. We’ll go over this inline code
and find more opportunities to make it faster. Let’s see how that works. Again, a warning, here’s
assembly code. It’s a very similar process to
the full code generator. We generate a little piece
of code for each of the expressions. But this time, you’ll notice,
as we generate each piece of code, there’s no call
at the end. And in fact, when we’re done,
it’s straight line code. So let’s compare the performance
using the optimizing compiler with the two
runs that we did before. You’ll see that with the
optimizing compiler, we’re even fast– we’re about
50 times faster than the original code. One thing that’s really useful
is actually to find out which functions are being
optimized by the optimizing compiler in V8. So there’s an option you can
pass to the DHL that will do just that. It’ll output the information
about which functions get optimized. However, there are certain
circumstances in which the optimizing compiler can’t
optimize a function. This is called bailing out. Or that’s what we call
it, bailing out. And the reason is there are
certain language features that are just not supported yet by
the optimizing compiler. The most obvious example
is try catch. So if you have a function that
contains a try catch block, the optimizing compiler, right
now, can’t optimize that. We hope to change that
in the future. But until then, there’s
a work around. If you have performance
sensitive code that you’d like to access from try
catch block, wrap it in its own function. Here’s an example. We have a function that’s
performance sensitive. We call that from the try piece
of a try catch block in another function. And by doing that, it allows
the optimizing compiler to optimize the performance
sensitive function, even though it can’t optimize
the try catch block. If you want to find out which
routines cannot be optimized by the optimizing compiler and
why, there’s an option you can pass through the debugging shell
called trace bailout. And that’ll actually tell you
the functions that could not be compiled and why. The last thing I’d like to talk
about is deoptimization. So the optimizing compiler
is pretty good at generating code. But there’s a catch. The optimizations that it
does are speculative. It makes the optimistic
assumption that what it has seen in the past is going to be
just like what it’s going to see in the future. And that usually pays off,
because usually your code keeps running like
it has before. However, as with all optimistic
assumptions, that may not be the case. So when you’re driving down the
Autobahn in Germany and you’re doing 200 kilometers an
hour, sometimes you’re working under the optimistic assumption
that you’re not going to be caught going
over the speed limit. Remember, I told you there are
some stretches of the Autobahn with a speed limit. And you’ll see out of the corner
of your eye, a red flash from two of these things
on the side of the road. And you’ve just been caught in
a speed trap, because your optimistic assumption that you
were not going to be caught speeding was wrong. And what do you do? You slam on the brakes. You slow down to the
speed limit. And you probably drive that
way for a while, slower. There’s a similar process
that happens in optimized code in V8. Invalid assumptions lead
to deoptimization. When one of the assumptions is
no longer true, when one of the hidden classes that you
get for an argument to an operation is not what you
expected, then the specialized code we generated is
no longer valid. And we have to deoptimize. What does that mean? That means we throw way the
optimize code, because we know the assumptions we made are just
simply no longer valid. We resume execution in the
right place in the full compiler code. So we can continue
our execution. But now it’s at a slower speed,
because we’re using the version of code that is
good, but not great. And a reoptimization of the
function might happen later, because we’re going to continue
to gather type information in the ICs in
the full compiler code. But for the short term, you’re
going to run more slowly. And what that means is that
deoptimization is something you want to avoid. And one way to do that is to
make sure that you don’t introduce hidden class changes
at a late time running your code. If you have types, object types
that you run in optimize code, and V8 chooses to optimize
that code, and then you later introduce new types,
that optimize code will deoptimize. And if you do that too
often, it has a cost. You can tell the debugging shell
to output the routines, the functions that it optimizes
on by passing the trace deopt option to
the debug shell. And it’ll output which one it
deoptimized on and why. Now most of you are doing web
development, I assume. And therefore, you probably
want to be able to look at your code in Chrome, as well
as directly in the DHL. You can pass these options
that I’ve mentioned also directly to Chrome using
the js flags option. And the output will all show
up in standard out when running Chrome. OK, so that was the
be prepared part. You now understand a little
bit about how V8 works. Let’s talk now about identifying
and understanding problems using that knowledge. What does that mean for V8? First of all, ensure the problem
that you’re looking at is a JavaScript problem. And this is sometimes
a little bit tricky. You could use the dev tools in
Chrome to figure out where you’re spending time in
your web application. It may look like JavaScript. But be careful it might be the
case that there’s also DOM interaction. So a really good way to do that
is, if you can take that code that you think is the
performance bottleneck, pull it out, and run it in the DHL,
that means it’s pure JavaScript. Because the V8 project is just
JavaScript, it doesn’t have– it’s integrated into Chrome. But it, in itself, is just
the JavaScript execution environment. So if you can take that code and
run it with the DHL, then you’re pretty– then you can analyze the
problems of JavaScript only performance problem. Collect metrics, measure
where you’re spending your time in the code. And then locate the bottlenecks
and fix them. So remember the example that we
had at the beginning, the prime number generator. If you run it in the shell and
pass an additional argument, the dash dash prof argument, it
will run the program while also sampling the execution
profile. And it’ll write a log that you
can then analyze with another tool to see where you’re
spending your time. Before we look at that output,
where we’re spending the time, I’d like to go over briefly
again where we’re spending the time– or what we’re doing
in the program. There’s a couple important
functions here in the prime generator. First, we add a prime. There’s a function to add a
prime to the existing list. There’s a function that
actually tests whether candidate number is divisible or
is divisible by one of the existing primes. And then there’s the main
routine, which does the high level algorithm. It starts at one, goes up,
checks that number, whether it’s divisible by an
existing prime. If not, it adds it to the prime
list, otherwise it goes to the next number. So what’s our expectation? The prediction is that we’re
going to spend most of the time in main. And the reason for that is all
properties and functions are monomorphic. I wrote the code
to be that way. All numeric operations
are smis. We do a pretty good job
at optimizing those. All the functions could
be inlined. And there are no deoptimizations
or bailouts. So my expectation is V8 is going
to do a really good job of optimizing this code. OK, so let’s take a look at the
output and see where we were spending our time. So this is the output of
the tick processor. It takes this output log
that’s written with performance data. And it prints it in
a pretty format. You see there’s two sections
of the output here. I’ve actually abridged this. It’s much longer. But these are the important
parts for our profile here. There’s a JavaScript section. This is the time we’re spending
in generated code. This is JavaScript code
executing in either the full compiler code or
optimize code. And then there’s the C++
section, which is– it’s often the case that you’ll
have C++ symbols in the profile, because there’s some
operations that we can’t do in generated code. So we have to call out. And as expected, we spend
a lot of time in main. But it’s actually about 25%. You’ll see there’s three
columns here. You see how many ticks you’re
spending in that routine, in this function. And you’ll see what the
percentage is of the overall execution time. But in this case
it’s only 25%. That’s weird because I
expected a lot more. V8 should do a really good
job on this code. If you look at the other items
that show up in this profile, this is weird. We’re spending time
doing a binary op stub with an oddball. An oddball, remember, I told
you those are things like true, undefined, and null. We don’t have that
in our code. That’s kind of strange. And we’re spending
a lot of time in library code doing modulo. And that’s kind of strange. We should be able to inline
that as well. I didn’t expect that. And then here’s another one
that’s kind of weird. We’re doing a conversion
from a number– extracting– we’re taking a number
and boxing it. And we really should be
generating very good code for this program. We’re spending way too much time
doing something that’s not executing the JavaScript. We’re calling out
to libraries. Something is very wrong
with this code. So can you spot the bug? The hint is, is that our array
is only of length prime count. And we’ve actually exceeded the
end of the array in our calculations. And in both the C++ and
JavaScript version, it doesn’t make a difference
in the results. And this is the thing that
you’ve got to watch out for, is that even though you don’t
see that there’s a problem in the results, the program’s
a lot slower than it needs to be. So let’s fix that. Let’s actually not do an
out-of-bounds array access. And we’ll re-run the profile
with the fix. And you’ll see this time, this
is what we were expecting. That small change now makes us
generate much better code because we’re not accessing
outside of the array, getting an undefined object– this
oddball– that we’re trying to do math on. And now we’re spending over
99% of our time in main. This is what we expected. So let’s see actually how big of
a difference this makes on the profile. And when we run the code against
the C++ code again, look at that. C++ code, three seconds,
JavaScript, 1.8 seconds– we’re 60% faster. Well, I forgot to optimize
the C++ code. So when you do that, C++
is still faster. But it’s 17% faster. It’s ball park. I have to take all of this
with a grain of salt. I chose an example where
I knew V8 would do a pretty darn good job. More and more, we’re optimizing
more cases that makes V8 faster in
more situations. So I expect in the future, we’ll
get closer and closer. But my point here is that
you need to make sure to take a look at– spend the time optimizing your
code to know where you’re spending your time, because
you can actually make your code very fast. So fix what matters. We did a lot of time analyzing
our existing profile, tweaking it, knowing how V8 works. The last thing I’d like to point
out is, make sure you don’t forget to optimize
your algorithm. So I have a new version of the
program here, where instead of iterating over all the prime
numbers, I only iterate to the square root of the candidate,
which is sufficient to actually test for whether it can
be divided by one of the prime numbers. And if I run that result– if I run that in the– I time that, then you’ll
see that now I’m under 5/100 of a second. And if we compare that with the
original version, which was 15 seconds, that’s
350 times speed up. So, in summary, keep your
eyes on the road. Be prepared. Learn a little bit about
how V8 works. Identify and understand the
crux of your problem. And fix the thing
that matters. You don’t have a whole lot
of time to do performance optimization. Make sure that every
minute counts. And my friend Michael is fine. He had taught me how to tie a
special knot, which freed my hands while he was hanging. I was able to call on my cell
phone the mountain rescue. Michael regained consciousness,
and I could lower him down. The helicopter came and took
us off the mountain. So even though he had a couple
broken ribs, he’s fine. Thank you. [APPLAUSE]

Author:

45 thoughts on “Google I/O 2012 – Breaking the JavaScript Speed Limit with V8”

  • Haha i was watching this video and i was wondering all the time, what happened to michael. I am glad that you told us at the end!

  • Vyacheslav Egorov says:

    example demonstrates two very important things: corner cases that V8 does not like (out-of-bounds reads, arithmetic with undefined value) and a technique that can be universally applied to optimize virtually any JavaScript code.

  • Vyacheslav Egorov says:

    JavaScript engines share some design decisions so most of them suffer from the same things. If you try this example in FF/Safari you will see that performance improves there as well. In C++ you can read out-of-bounds, whether you get an error or not depends only on a memory layout. In this particular case you never actually read out of bounds, you just read an uninitialized part of an array. And C++ actually does the same (even worse: prime_count is uninitialized).

  • 2:15 performance matters because "every cycle that you win back when you do performance optimisation you can invest in something new that you haven't been able to do before". what a hollow marketing phrase. "every cycle"? = clockcycle? if the goal is to do new things you haven't been able to do before, I would say first figure out what's required to allow you to do the new thing, usually it's actually writing the code for the new thing

  • so to clarify, i think this is just one of those cases where people think "working on performance is fun and interesting, now i just need to utter some phrases to justify it". don't get me wrong, i agree the performance stuff can be fun and useful, but you can't really provide a general (in scope) rational justification without sounding hollow.

  • Great talk. Seems like the JS code is actually calculating the 24999th prime? You insert 1 into the array too that's why iterating from index 1 instead of 0 works. The example seems a little bit contrived – the codie isn't terribly great to begin with.

  • To summarize: Write your JavaScript as if it were C++ and you'll be fine. Initialize variables at the top of a function, and don't mix types, be it in functions or arrays.

  • i would amend that to say 'write JS like C rather than C++'.
    what you said plus
    – no function argument type overloading
    – no try/catch

  • Vyacheslav Egorov says:

    zval under the hood is just a tagged union. tagged pointers (like V8 does it) or NaN-tagging (used by SpiderMonkey, JavaScriptCore) are compact ways to represent tagged unions. they allow to reduce memory footprint and make passing tagged values around faster e.g. tagged pointer will occupy one machine word and can be stored in a single machine register while zval would require several registers.

  • As a user I haven't experienced any change. Sites from today feel pretty much like before the whole JS ending war started. The speed gains are marginal outside synthetic tests unless you try to play a JavaScript port of Quake in the browser…

  • This is a big problem to developers. The reason you haven't seen a change in performance is because the browsers don't do anything that requires performance to run. If a webpage pauses for a noticeable amount of time in IE6 than that web page is broken. The goal here is not to have performance which is noticeable to guys like you, who don't do anything interesting with a browser. It's to make interesting things possible. In the mean time, you can enjoy the spell check, firewall, and advanced UI.

  • That's fair. He probably should have used an example that wasn't such an obvious mistake. But to say v8 isn't javascript is a little harsh, javascript isn't v8, and v8 does things differently than most engines, but they're smart enough to avoid memory leaks when using outbound locations in an array. I think he actually mentioned growing large arrays as the correct convention.

  • It's not a bug, it's a feature. Javascript, unlike C++ but much like other scripting languages, was always intended for the speed of development rather than execution.

  • Ozezies Zies says:

    Ola..hi..gd.afternoon..can.somebody..talk.with.mee..perhap..i need a someimportant to talk..oze..with all my webmaster and all developer..s.o.s

  • The *actual* point is that something that would be a crash error (or something more disastrous) in another language may simply result in allocation/conversion costs in JavaScript. At the *language level*, JS doesn't *have* arrays as they are normally constructed; the creation of arrays (as opposed to dictionaries) is a JIT/run-time optimization (and there are still no hard bounds). There is no "out of bounds", there is only accessing unassigned members.

  • Tommy Ray Staley Jr. says:

    Really true been please help really tru how do it fix on Ipad 2 new YouTube Application Crash Your Last Session Closed Unexpectedly Send A Crash Report?

  • Leonid Romanov says:

    In my opinion, this really shows how bad JavaScript is as a programming language. People been complaining for ages how easy it is to shoot yourself in the foot with C++, but it seems to me that JS is even worse in this regard. I wish the language itself was progressing with the same speed as JS engines.

  • No matter what language you build your application in, always make sure you know how your language works so you can have optimized code.

  • This is kind of funny, that after so much talking about that primes calculation algorithm and optimizing the heck out of it you arrived with 0.044s to calculate the 25,000th prime. Also your calculation is wrong, you calculated the 24,999th prime, by the way (the first prime is 2), and the 25,000th prime is 287117.

    I didn't know most of what was explained in the video, but in my first take at the primes calculation it took only 0.025s to calculate the 25,000th prime 🙂 Talking about optimization here, huh! 🙂

    Here's the entire test:

    function nextPrime(value) {
    if (value > 2) {
    var i, q;
    do {
    i = 3;
    value += 2;
    q = Math.floor(Math.sqrt(value));
    while (i <= q && value % i) {
    i += 2;
    }
    } while (i <= q);
    return value;
    }
    return value === 2 ? 3 : 2;
    }

    var value, start = Date.now();
    for (var i = 0; i < 25000; i++) {
    value = nextPrime(value);
    }
    console.log('Value:', value);
    console.log('Time:', Date.now() – start);

  • Lakshan Dissanayake says:

    Hey guys, Check out performance of Golang.. incredible
    https://gist.github.com/supunz/44c79820a6b9399afa3f56341c2825ef
    Test 1: 25000th prime is 287117 took 49.374381ms //1 is not a prime no in my case

  • Summary is: initialize your data first (instead of just assuming default values for variables / array elements) and avoid changing data type for any variable / array element. That's all. (funny [and kinda obvious] that those things aren't permitted in common strong-typed languages like C, Java, etc)

Leave a Reply

Your email address will not be published. Required fields are marked *