Cyber Security Leituras, traduções e links: The Solution: WebSockets

Saturday, January 20, 2018

The Solution: WebSockets

No doubt you’ve heard people talking about HTML5 and all its neat new features. Two of these new features directly apply to realtime web technologies and client server communication—a fantastic result demonstrating that the web standards organizations and browser vendors really do listen to our feedback.

Server-Sent Events and the EventSource API¹³ are a formalization of the HTTP streaming solution but there is one more solution that’s even more exciting.

You may have heard the term WebSockets a time or two. If you’ve never really looked into realtime before, WebSockets may not have shown up on your radar except as a buzzword in articles talking about all the great new features of HTML5. The reason why WebSockets are so exciting is that they offer a standardized way of achieving what we’ve been trying to do through Comet hacks for years. It means we can now achieve client server bidirectional realtime communication over a single connection. It also comes with built-in support for communication to be made cross-domain.

Figure 1-5. Websockets open a full-duplex connection, allowing bidirectional client server communication

The WebSocket specification is part of HTML5, which means that web developers can use the WebSocket protocol in modern browsers.¹⁴

According to the WHATWG,¹⁵ the WebSocket protocol defines a standardized way to add realtime communication in web applications:

The WebSocket protocol enables two-way communication between a user agent running untrusted code running in a controlled environment to a remote host that has opted-in to communications from that code. The security model used for this is the Origin-based security model commonly used by Web browsers. The protocol consists of an initial handshake followed by basic message framing, layered over TCP. The goal of this technology is to provide a mechanism for browser-based applications that need two-way communication with servers that does not rely on opening multiple HTTP connections (e.g. using XMLHttpRequest or <iframe>s and long polling).¹⁶

One of the most beneficial implications of widespread WebSocket support is in scalability: because WebSockets use a single TCP connection for communication between the server and client instead of multiple, separate HTTP requests, the overhead is dramatically reduced.

The WebSocket Protocol

Because full-duplex communication cannot be achieved using HTTP, WebSocket actually defines a whole new protocol, or method of connecting to a server from a client.

This is accomplished by opening an HTTP request and then asking the server to “upgrade” the connection to the WebSocketprotocol by sending the following headers:¹⁷

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin:http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13

If the request is successful, the server will return headers that look like these:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat

This exchange is called a handshake, and it’s required to establish a WebSocket connection. Once a successful handshake occurs between the server and the client, a two-way communication channel is established, and both the client and server can send data to each other independently.

Data sent after the handshake is enclosed in frames, which are essentially chunks of information. Each frame starts with a 0x00byte and ends with a 0xFF byte, meaning that every message sent has only two bytes of overhead in addition to the message’s size.

So we’ve made it very clear that this is great news for web developers. But it’s not all unicorns and ice cream cones, unfortunately: as ever, we’ll be waiting for a minority of users and companies to upgrade to modern browsers. We’re also going to be waiting for some parts of the Internet infrastructure to catch up. For instance, some proxies and firewalls block legitimate WebSocketconnections. This doesn’t mean we can’t start using them in our applications, however.

HTML5 WebSocket Technology and Pusher

We already talked a bit about WebSocket and realtime, but let’s recap: HTML5 WebSocket allows applications to push data to the client rather than requiring the client to constantly ask for new data.

EXERCISE 2-6: TRYING OUT THE WEBSOCKET API

Let’s have a look at the native WebSocket API to get an idea of how it can be used. Create an HTML file with the following content. This file contains JavaScript that connects to a WebSocket echo test service. This means that you can test connecting, sending, and receiving messages.

<!doctype html>
<html lang="en">

    <head>
        <meta charset="utf-8" />
        <title>Trying out the WebSocket API 02-06</title>
    </head>

    <body>

        <script>
            var ws = new WebSocket( 'ws://echo.websocket.org' );

            ws.onopen = function() {
              console.log( 'connected' );
              console.log( '> hello' );
              ws.send( 'hello' );
            };
            ws.onmessage = function( ev ) { console.log( '< ' + ev.data ); };
            ws.onclose = function() { console.log( 'closed' ); };
            ws.onerror = function() { console.log( 'error' ); };
        </script>

    </body>

</html>

If you open up this page in a browser that supports WebSocket and open up the browser’s JavaScript console, you’ll see the following:

connected
> hello
< hello

The connected message is displayed when WebSocket has connected to the server, and the onopen function handler has been called. The code then logs > hello to indicate it’s going to send hello over the WebSocket connection to the server using the WebSocket send function. Finally, when the server echoes back the message, the onmessage function handler is called, and < hellois logged to the console.

This demonstrates how to use the WebSocket API and gives you a glimpse of how useful it could be. But, as we covered in Chapter 1, the WebSocket API is not fully supported in all browsers just yet, and we need a fallback mechanism. As a result, implementing realtime apps can be cumbersome, tricky, and extremely time-consuming if we have to handle browser compatibility issues ourselves.

Fortunately for the rest of us, there are a number of services out there that have overcome these hurdles and created APIs that start by checking for WebSocket support; then regressively check for the next-best solution until they find one that works. The result is powerful realtime functionality without any of the headache of making it backward-compatible.

Among these companies offering realtime services, Pusher stands out for its extreme ease of implementation, free accounts for services that don’t have large user bases, great documentation, and helpful support staff.

Pusher provides a JavaScript library¹¹ that not only handles fallbacks for older browsers but also offers functionality such as auto-reconnection and a Publish/Subscribe¹² messaging abstraction through its API, which can make it much easier to use than simply dealing with generic messages, as would be the case if we used the native WebSocket API.

Finally, because Pusher is a hosted service, it will handle maintaining the persistent connections over which data will be delivered and can deal with scaling to meet demand for us. Although this latter point might not be a big deal for our sample application, it’s a valid consideration when you are building a production application.

For those reasons, we’ll be using Pusher in this book to build our realtime.

Why Do We Need It?

Pusher will allow you to add realtime notifications and updates to the application, including the following:

Updating all users when a new question is added: This means that when a user adds a new question, all users currently using the app in that room will receive the new question immediately.
Updating attendees when the presenter marks a question “answered”: When the presenter answers a question, marking it “answered” will instantly update all attendees’ devices to prevent confusion.
Updating the presenter when more than one attendee wants the same question answered: If more than one user is interested in having a question answered, they can upvote that question. The presenter will receive a visual indication to let them know that the question is pressing.
Updating all attendees when a room is closed: When the presenter closes the room, attendees need to be updated so they know not to ask any questions that won’t be answered.

What Role Does It Play?

Pusher will play the role of the app’s nervous system: it will be informed when changes are made and relay that information to the brains of the app so that they can process the information.

How Does It Work?

In the simplest terms, Pusher provides a mechanism that lets the client “listen” for changes to the app. When something happens, Pusher sends a notification to all the clients who are listening so that they can react appropriately. This is the Publish Subscribe paradigm we mentioned earlier.

Chapter 3 is dedicated to the finer details, so we will skip the exercise in this section.

OAuth

Unlike the technologies discussed so far, OAuth is a protocol, not an actual programming language. It’s a concept that was drafted in 2007 to address the issue presented by websites that provided services that overlap; think about how social networks can access your address book to look for friends or how a photo sharing site can tie into Twitter to let your followers know when you’ve posted a new photo.

The problem was this: when these services first started to work together, they required that users provided a username and password to access the service, which was potentially a huge risk. What was to stop a shady service from using that password for its own purposes, up to and including the possibility of changing your password and locking you out?

This was a big concern. OAuth devised a solution based on its study of a number of other attempts to solve the problem, using what it considered to be the best parts of each.

To paraphrase an excellent analogy from the OAuth website:¹³

OAuth is like giving someone the valet keys to a luxury car. A valet key will only allow the car to drive a few miles; it doesn’t allow access to the trunk; it prevents the use of any stored data in the cars onboard computers, such as address books. OAuth is similar to a valet key for your online services: you don’t provide your password, and you’re able to allow only certain privileges with the account without exposing all of your information.

For instance, Facebook uses OAuth for user authentication on third-party services. If you’re already logged in to Facebook, you’re presented with a dialog (on Facebook’s domain), telling you which permissions are required and allowing you to accept or deny the request. Privileges are compartmentalized—reading someone’s timeline is different from viewing their friends list, for example—to ensure that third-party services receive only the privileges they need to function.

This keeps users safe and reduces liability for web apps. It also provides a wonderful benefit for developers: we can allow a user to log in to our app with their Facebook, Twitter, or other credentials using a simple API.

Why Do We Need It?

We don’t need it in the app that we’re building, but it would be a neat feature so we’ve included it in Appendix A if you want to see how it could be included. In a nutshell, we would use OAuth to eliminate the need to build a user management system. This would also hugely reduce the time needed to sign up for an account without reducing the app’s access to the information it needs to function.

Let’s face it: most people have more accounts than they can remember on the Internet. The difference between someone using our app and not using our app could be something as simple as how many buttons he has to click to get started.

OAuth provides a great way to get everything we need:

Verify that the person is indeed real: We can reasonably assume that anyone who is signed into a valid Facebook or Twitter account is a real person.
Collect necessary data about the user: For this app, we would really only need a name and e-mail.
Reduce the barrier to entry: By eliminating all the usual steps of creating an account, we could get the user into our app in seconds with just two clicks.

What Role Does It Play?

OAuth would be the gatekeeper for our app. It would use third-party services to verify the authenticity of a user and gather the necessary information for the app to function.

How Does It Work?

You’ll find more details on the specifics of OAuth in Appendix A, but at its core, OAuth contacts the service through which we want to authenticate our user and sends a token identifying our app. The user is prompted to log in to the third party service if they’re not already and then allow or deny the requested privileges from our app. If the user allows our app to access the requested data, the service sends back a token we can use to retrieve the necessary data and consider a user “logged in” to our app.

Summary

At this point, we have successfully defined a rough list of functionality and requirements for our app. We also used that information to flesh out a list of tools we will use to bring the app to life.

In the next chapter, you’ll get familiar with Pusher and its underlying technologies, and you’ll build your first realtime application.

¹ This is 100% the opinion of the authors.

² http://www.apress.com/9781430228745

³ http://www.apress.com/9781430228479

⁴ http://socket.io

⁶ http://signalr.net/

⁵ http://faye.jcoglan.com

⁷ http://www.tornadoweb.org/

⁸ http://www.apress.com/9781430224730

⁹ http://www.sequelpro.com/

¹⁰ http://www.navicat.com/en/

¹¹ http://pusher.com/docs/client_libraries/javascript