Media APIs for the multi-platform web – Chrome Dev Summit 2013 (Sam Dutton, Jan Linden)

Media APIs for the multi-platform web – Chrome Dev Summit 2013 (Sam Dutton, Jan Linden)


MALE SPEAKER: So back in
the dark days of the web, doing anything even
remotely fun with audio, video, real-time
communication, that was the domain of plug-ins. And we don’t really
like plug-ins. This is no longer true. We have a fantastic mix of
low- and high-level APIs that we can sort of
manipulate this stuff, send data between machines,
and manipulate audio and video. It’s great stuff. To show us how to get the
most out of those APIs, I’ll introduce our next
speakers, Sam Dutton and Jan Linden, everyone. [APPLAUSE] SAM DUTTON: Thank you. Thank you very much, Jake. I’ve never been
called that before. Yeah, I am Sam Dutton. I’m a developer advocate
for Chrome based in London. JAN LINDEN: And I’m Jan Linden. I’m a product manager
on the Chrome media team with a special
focus on audio. SAM DUTTON: And everyone
knows Chris Wilson, who is going to be performing
for us a bit later. So if I can get my
clicker to work. There’s going to be quite a lot
of links in these slides, which might work better online
than on the screen here. So if you want to follow
along, these slides are up at simpl.info/media. So I think we’ve seen two
big changes on the web. And these have happened
really suddenly. The first is the stuff that
Linus was talking about, the rise of this
multi-device web. You know, we’ve got
more mobile phones than, like, toothbrushes
or pencils and so on. There’s this
commoditzation of computing on a scale we haven’t seen
since the PC boom of the ’80s and ’90s. But the thing is
that it’s not just kind of the same thing
with more computers. The change of
devices is actually changing what we’re
doing on the web. And in particular, like
Jake was referring to, we’re seeing things the
rise of audio and video and real-time
communication on the web. And this is, in a sense,
I think a lot of it is to do with the fact that
we’re using these devices now which are naturally
oriented to consuming media and communicating. You know, phones and
tablets and so on. This is actually having a
knock-on effect elsewhere. I mean, a lot of us, when
we watch television now, we do other stuff
with mobile devices. We access other media while
we’re watching television. And in a lot of countries,
in fact, TV viewing is down. Online viewing is up. I just read actually in
“The Economist” about China, that in Beijing, like only 30%
of people now use a television. And that’s down from like
70% percent of households three years ago. I mean, popular
online shows in China, they get audiences
of like 250 million. And so the predictions are that
within a few years, something like maybe 90% of
the bits that cross the internet will
actually be video. This translates to something
like a million minutes of video a second, which is incredible. So I mean, it’s
not that we’re all going to build the next
YouTube or whatever. But we’re coming
to expect this kind of seamless audio, video,
real-time communication in our apps. And we’re expecting
that on all our devices. So what have we got? Well, we have these media
APIs for the open web, and we have these interoperable
standardized technologies working across a huge
range of devices. So in the next 25 minutes, we’re
going to just try and touch on some of the things we can
do with those APIs on the web. Perhaps the best place to
start– I just [INAUDIBLE] any more feedback– is WebRTC. This is perhaps the
most ambitious– kind of odious word– but
disruptive, really, disruptive projects on the web. This is real-time communication
built into the browser. Open source, plug-in
free, free for end users, free for developers. And this has been like a missing
piece of the web platform. And this is kind of crucial for
building web apps on devices which are oriented
to communication. So maybe the best way
to show this in action is let’s have a chat. We have– well, I’m going to use
Opera, actually, on the desktop here to get to that address. JAN LINDEN: –Android tablets. I think that’s best. SAM DUTTON: I can see
someone else has already joined us there. We’ve got Firefox running
over there on the Nexus, and then Chrome for
Android on the Nexus 7. And you can see that’s
working pretty well. And this is like
a full mesh setup, everyone connected to everyone. So it’s not actually
particularly efficient. Hello, Brian. So, I mean, this is great. We’re seeing great stuff
coming through with WebRTC. We’ve got really great adoption. Firefox, Chrome, and now Opera
on desktop and on Android, bindings for Java, Objective
C. And yeah, pretty recently, Qt framework started using
Chromium, which gives us access to loads of stuff,
including WebRTC. So we have well over
now a billion endpoints and a lot more predicted
for the near future. So I guess the obvious
use case for WebRTC is video chat like
we were just showing. But there have been some really
innovative apps coming through. Just back in August,
kind of helped out a company called vLine,
Sky did live interview in South London where I live. And they did that
all with WebRTC. You know, in the old days,
to do stuff like this, you needed all this kit, all
these people, these trucks, all this stuff. And they did it with
this really simple setup, using a camera
like I’ve got here and one of these $100 Yeti mics. And I think this is amazing. This is like the
democratization of broadcast. I love this stuff. So in the old days,
real-time communication involved something like this,
this highly simplified diagram. But you were required
to go through some kind of centralized service. And essentially, that
box in the middle was acting like a
relay service, which is just inherently inefficient. Perhaps the most radical
thing about WebRTC is that its architecture
is inherently peer to peer. So in other words, wherever
you’re calling from, so to speak, whether you’re
on Wi-Fi or on a cell, fixed, whatever, the idea
is that you can communicate, be streaming data
directly between peers. And this is, of course, far
more efficient and better for performance. So what do we need for
real-time communication? Well, obviously, we need to
be able to get audio and video from our camera and our mic. And then we need
to be able to make a connection between the
two peers, the caller and the callee. And that needs to
be able to cope with the real world of NATs
and firewalls and so on. Once we’ve made that connection,
we need to be able, then, have really high
performance streaming that works in this modern
world of multiple device. So audio without
jittering and video that’s smooth and so
on and so on and so on. And I guess the
other part of WebRTC has also always been decided
that we could communicate arbitrary data as well. So not just audio and video,
but all types of data, binary data, text, and so on. And we’re seeing some
really interesting use cases for that coming through. So for this, we have three APIs. MediaStreams. People may know it
better as getUserMedia. Has anyone here built an
app using getUserMedia? Oh, that’s great. That’s like the most
I’ve seen in an audience. That’s fantastic. And then, of course, we
have RTCPeerConnection and RTCDataChannel, which
I’ll talk about a bit. So on one level, this is
kind of a simple concept. You get some media,
and then you plumb that into RTCPeerConnection to
get it to the other side. Once you get it
to the other side, you’ve got your media
stream, and then you can do something like plug
that into a video element. Of course, it’s not that simple. RTCPeerConnection, the bit
in the middle, so to speak, needs to cope with
the real world, like I said, of NATs and
firewalls, and so on. And also to be able to do
clever stuff to make sure that video and audio
work really, really well across a range of devices
and a range of contexts. JAN LINDEN: Yeah, and echo
cancellation, all these things that make for a good experience. SAM DUTTON: Yeah,
there’s a stack of stuff that goes on in incredible
detail in the design there. And RTCPeerConnection,
the whole of WebRTC is also designed with
security in mind. So at every stage, everything
is encrypted within the code and when you’re communicating
between peers, which is kind of crucial
to this project. So if you want to have a look
at WebRTC, check this out. This is kind of the
absolute basics. We also have a kind of
full fat video chat app which is maintained by
Google, which kind of does everything and has
really verbose logging. So it’s a good place to
work out what’s going on. The thing about
RTCDataChannel, like I say, this is the API for other
kinds of data, arbitrary data. Imagine you’re playing a game. And this is highly
simplified, but you know, you want to exchange
player positions, for example, and do that with low
latency and effectively trying for real-time. [PHONE RINGING] SAM DUTTON: Well, we have
RTCDataChannel for that. And this has exactly– I
don’t know who is ringing. Someone. So this has exactly the
same API as WebSocket. But what we’re aiming for
here is really low latency, firstly because the idea is that
you’re communicating directly between peers. And using the SETP
protocol means we have a lot of the
advantages of UDP. So what we can do is
optionally specify that we don’t need the
connection to be reliable. In other words, if you’re
exchanging player positions, it doesn’t matter if you lose
some packets along the way. What you really want
is high performance. Whereas, say, for file-sharing,
you can’t lose any bits. That’s not going to work. So you might take a
slight hit on performance to make sure that the
transport is reliable. So again, if you want
to check this out, this is a good place to start. Just wanted to show you a really
nice app I’ve seen using this. I don’t know if people
have seen Sharefest, but what this gives
is the ability to share files using
RTCDataChannel without having to go through an
intermediary service. So what I’ve done is I’ve
selected a file there, Sharefest gives me
a URL, and then I can do something like
post that URL to someone. And then they can open
that file at the other end. So when they come
to open a file, a peer connection will be
established between the two browsers, so to speak. And then data
transfer of the file will happen directly
between the two without, like I say, going
through an intermediary server. It’s kind of interesting
stuff going on there. Perhaps a more radical project,
have people seen peerCDN? This blows me away. This is this idea of
peer-to-peer content distribution using
RTCDataChannel. So the way it works
is that you put a link to peerCDN’s JavaScript at
the top of your web page. And then the way it works
is that it will try then to get assets on the
page from local peers rather than defaulting
to downloading them from the server. So yeah, it’s a
kind of prototype, but it’s this kind of vision for
a really different way of doing things on the web. So I kind of skipped
over getUserMedia. I just wanted to
sort of go into that in a little bit more detail. GetUserMedia is pretty
straightforward. You can see here. I mean, you call getUserMedia
on the navigator object, and you pass it a
constraints object. This is actually
really powerful. This is defining what
kind of media we want. In this example, all I’m
saying is, give me video. Just give me the default video. And then the success callback is
passed a string, which then you can do what you want with. Let’s see that in
action, in fact. So yeah, I’m going to page
this call in getUserMedia. Notice that I have to explicitly
allow access to my camera there. And then, bang. We’re getting video
from the– I’ve got a USB camera plugged
in there, you can see. Hello, everyone. What’s nice about
this, though, is when you see these–
I don’t know, these APIs coming up
against each other. So in this example, we’re
getting video from the camera. And let me turn around this
way, and you can actually see something. What’s happening is
we’re getting frame grabs from the camera, plumbing
those into a canvas elements, and then analyzing that
canvas element pixel by pixel to give us ASCII art. Which is, it’s just
nice when these APIs can interact like this. Thinking about working
with constraints, we can also use
constraints to select the resolution of our camera. So– wow, that’s slow. If we go to this web
page, there we go. In fact, let’s look at the
source for this on GitHub, if that’ll open. You’ll see, if you
can see here, I’ll just bump up the
size a little bit, we’ve got three different
constraints objects. These are like each represent
kind of low, medium, and high res, so to speak. So when we click the button
here, we’re getting 640 by 360 right up to HD here,
which is– oh, sorry, I’ve still got the other
one open here. So let me just close that. And then, hang on, just
pull that through again. So basically, we’ve got the
ability to choose resolutions. There we go. You can see I’m getting
the full HD from– hello. Wave to the camera. This is kind of
crucial, of course, also when we’re thinking
about getting media from a mobile device
and making sure that we’re not overdoing
it in terms of bandwidth. We can also use constraints
to select camera sources. So let’s do that now. I’m going to go for
this on the device here. In fact, let’s show it on
mobile device, I think. I can actually, if we look
at– you can come over here, and I’ll show it to you. JAN LINDEN: OK. Sure. SAM DUTTON: So
basically, we can get the ID for the different
devices attached to the machine. And then we can select
different cameras, depending on which we want. And as you can imagine,
this it is kind of bizarre. This is crucial, again, for
devices like phones and slates where we obviously want to
be able to do selfie mode or kind of crowd mode. So there we go there, yeah. They’re just about,
just about— JAN LINDEN: I prefer selfie. SAM DUTTON: Selfie is good. And again, this is all
done with constraints. Same with screen capture. We can do the same thing. So I’ll just go out
of full-screen mode. So we’ve specified
in the constraints that we actually want to get
video from a screen capture. Which is slightly
ridiculous, because we’re getting a screen cast of
a screen cast, but anyway. You can imagine there are
lots good use cases for that. People ask about recording. We have an API for
this coming through, being specified and implemented. MediaStream Recording for
recording audio and video. There’s also the MediaStream
Image Capture API, which is kind of a
taking photos API. You can do that with
getUserMedia and Canvas. But this gives us
access– well, is proposed to give access to stuff
like camera autofocus and zoom and so on. So I think that’ll be great. Again, really good
for mobile devices. So there’s a lot
about video, and I was wondering if
Jan could tell us a little bit about
the world of audio. JAN LINDEN: Yeah, what’s
the deal with that audio? It’s pretty much
nothing, isn’t it? So let’s focus on
audio a little bit. So we have a great
thing that’s been around for quite a while,
the audio tag. Well supported,
very easy to use, and it takes care of everything. Loads, decodes, buffers,
and plays out your audio, and even includes a player. So you don’t have to do
anything more than that line, and you can play an audio file. That’s great. But there are applications
where that is just not enough. For example, precise timing
is super important in gaming and being able to handle
many different sounds at the same time and
music creation, et cetera. So we came up with
the Web Audio API. And the Web Audio API is really
just a pro audio environment in the browser. And it involves a lot of stuff. So let’s start with just
a look at what you can do. As I said, you can do
timing, but you can also do cool things
like create audio. So Chris here has
built a synthesizer. And all this, the audio
is created by Web Audio. And with all the controls,
you can switch things on with perfect timing
at any time you want. And some of the
things– there you go. [INAUDIBLE] CHRIS WILSON: I’m not
going to stay up here. JAN LINDEN: All right. So Web Audio can do that. So we just saw an example
of creating audio. We can do timing. We can also analyze audio data. Super nice in applications
are, for example, visualizers and things. And even if you still
want that old– that long file that you want to
download with audio tag, then you can just plumb
that into Web Audio, and you get the ability to add
visualizers, effect, et cetera on top of that. So audio tag can come as
a way of loading audio into web audio. So I’m not going to go into all
the details about Web Audio. It’s just such a powerful tool. And what’s really
nice, all these things are implemented in native
code in the browser. So you don’t have to
do this in JavaScript. In JavaScript, you have a
nice, node-based pipeline where you just plug in things
and easily put it together. But it’s really pro audio. So for example,
a good example is that we can have input to
output in five milliseconds. And I’ll give an example of
how that matters in a second. The support for Web Audio
is also really great. It’s really a pleasure to note
that Firefox just two weeks ago released Web Audio on
both desktop and Android. This means that we have Web
Audio in Chrome, Safari, Firefox, on desktop, all desktop
platforms, on Android, and iOS. So– and Opera, sorry, sorry. Yeah, that’s so new that
it’s not even in my mind. And it’s not even on our slides. It’s your fault. OK, so one thing that’s
really, really exciting, and we started with this, is
that we have all these media APIs and that they
work together. And we can do really,
really cool things together. And one example is how
you hook up getUserMedia that Sam talked about and
showed video examples. But you can obviously use that
to hook up your microphone and get an audio stream
and do something with it. And a great thing you
can do is you can plug it in as a source in Web Audio. So you get your
microphone input. That’s a source in Web Audio. Then you can apply anything
you want in Web Audio. For example, one
great example would be you plug in your guitar,
you plug it in here. And the audio that
comes into Web Audio, you can apply
filters and effects. And then you play it out
five milliseconds later. So basically, you have an
effects box in your browser. But let’s not only
talk about this. Let’s show something and
listen to something here. Many of you have probably
seen this before. Chris here, he built a vocoder,
which is a great instrument, for I/O last year. And since then, he– oh, sorry. I should– since then, he has– [VOCODER PLAYS] So anyway, so since then,
he has added live input. That’s what I was talking
about in getUserMedia. But also now, this
works on mobile. CHRIS WILSON: Can
you hold the camera? JAN LINDEN: See if we
can get the camera. CHRIS WILSON: Hello,
Chrome Dev Summit. JAN LINDEN: Oops. CHRIS WILSON: [VOCODER
VOICE] There we go. Hello, Chrome Dev Summit. JAN LINDEN: Live
from the Nexus 5. CHRIS WILSON: I did also want
to say the synthesizer also works quite well even on iOS. There are a couple of issues
I’m still working through with Firefox. But it really shows
that these things work across mobile and desktop. It’s not just on the desktop. JAN LINDEN: Cool. We are not in the right place. OK. Next thing, when you have the
audio in through getUserMedia to Web Audio, then you want to
plug it into PeerConnection. And then you can talk to
someone on the other side through a WebRTC call
and apply the effects. Next thing you want to do
there is apply these effects on the output as well. That’s in the works. So you can, for example, do
specialization of the output from a WebRTC call. OK, Chris. Another great thing
that we need for audio is being able to not only use
the keyboard and stuff here, actually be able to connect
media devices, like a synth. Or let’s look at– CHRIS WILSON: You
did notice before, I was playing this keyboard that’s
connected through the Web MIDI API. But we also have a couple
other apps that we’ve– oh. JAN LINDEN: Oh, the audio? CHRIS WILSON: Plug
the audio back in. I knew we had
something else coming. There we go. [MUSIC PLAYING} JAN LINDEN: OK, dance. I won’t dance. CHRIS WILSON: All right, they
said they were going to dance. I just kind of looked– JAN LINDEN: OK,
where’s the music? That was the deal. CHRIS WILSON: The key– JAN LINDEN: He believed this. CHRIS WILSON: It’s
probably better they don’t. The real key here,
though, is that you really need– for a lot of
audio applications, you need this very hands-on
controller experience. And I could do things like,
you know, shuttle through here and find a particular
point, and set a queue. That kind of stuff. There you go. SAM DUTTON: You got your dance. JAN LINDEN: OK. We are quickly running
out of time here. So let’s skip to some
other important things for the web and
media on the web. One thing that we really think
is super important for this to really work for the open web
is open and free video codecs. So we have developed
VP8, and that’s been out for several
years, that really address what you need today. But we didn’t stop there. Now we also have VP9, which
is the next generation that saves more bits at a
really good quality. So you can look here at
the bars in the middle. So we compare H.264 and VP9. And you see the bar
is the bit rate, how much we save all the
time for the same quality. This one’s from Google
I/O. So really a great job from our engineers to do this. And this is open
and free and can be used by anyone at no cost. And the tools. Other important things for
a full media experience is being able to, for example,
do adaptive streaming. Your bandwidth is
not always the same. You need to be able to handle
changes in your bandwidth. And the media source extensions. One of things they
do is allow for that. We also have encrypted
media extensions that make it possible to
handle content protection. And then I leave for
you for things like– SAM DUTTON: Yeah. So I’ve got– JAN LINDEN: One minute. SAM DUTTON: A few minutes left. Anyway, we’ve got
a little bit extra. So yeah, we’ve got great
support for audio and video on a range of platforms. Also thinking about, like,
captions and subtitles and accessibility. And we’ve got great
support for this as well. Just looking, I
don’t know if people have seen the track element. This is kind of a simple way
of adding subtitles or captions to your applications. Chug, chug, chug, OK. There we go. [VIDEO PLAYBACK] -The web is always changing. [END VIDEO PLAYBACK] SAM DUTTON: Yeah,
and you can see there that we’ve got track subtitles
rendered over the video. And then I’m listening out
for the cue change events and then plunking that
stuff in a div there. So just a really simple way to
make your video more accessible and to add that stuff. The way you do it is
that you add a track element as a child
of the video element. And then you point
that to a file that looks like this,
which is just essentially some information
about timing, and then the text of, in this
case, the subtitles. There’s also some information
about the way the subtitles, kind of hints about how they
might be rendered there. This little link
down the bottom here, actually, I think
this is amazing. Now implemented in
Chrome is the ability to do what’s called
in-band WebVT. This gives us the
ability to bundle a track file with a
video file in order to be able to distribute
one file that contains the video and the whole thing
in one package, in one file. So we’ve kind of run
out of time here. But I’d just like to go just
to one more demo that I think that really, for me,
sums up how far we’ve come with audio and
video on the web. JAN LINDEN: And here,
you’ll get the real dance. SAM DUTTON: Yeah, this
slightly weird video shows– what we’ve got
here is two videos encoded with alpha
transparency information. And you know, these are
just video elements. So– that is weird. So we have the ability to do,
you know, like CSS filters. These are just elements. We can double-click
on these, and I’ve got a little CSS animation. Double-click the background. You know, we have a video
playing in the background. Actually, these are, like, basic
sort of green screen videos. But actually, the rendering
is really stunning. So yeah, I’m just,
I don’t know, really pleased to see that stuff. So, yeah, please. We really, really
look forward to seeing what everyone makes
of these APIs. Because I think they are a
great feat of engineering and incredibly powerful
across platforms. JAN LINDEN: We only got to
touch on a few things here, but there’s much more, so– SAM DUTTON: OK,
thank you very much. [APPLAUSE]

Author:

2 thoughts on “Media APIs for the multi-platform web – Chrome Dev Summit 2013 (Sam Dutton, Jan Linden)”

  • You have audio.  The message is coming from "somewhere".  Use "flashing".  Control Monitor mounts.  Track human heads.  Auto adjust text size and position.  RTCDataChannel looks handy.

Leave a Reply

Your email address will not be published. Required fields are marked *