Re: Hello from Mozilla from Ian Hickson on 2009-07-17 (squid-dev)

From: Ian Hickson <ian_at_hixie.ch>
Date: Fri, 17 Jul 2009 10:00:09 +0000 (UTC)

On Fri, 17 Jul 2009, Adrian Chadd wrote:
> 2009/7/17 Ian Hickson <ian_at_hixie.ch>:
> >> That way you are still speaking HTTP right until the "protocol
> >> change" occurs, so any and all HTTP compatible changes in the path(s)
> >> will occur.
> >
> > As mentioned earlier, we need the handshake to be very precisely
> > defined because otherwise people could trick unsuspecting servers into
> > opting in, or rather appearing to opt in, and could then send all
> > kinds of commands down to those servers.
>
> Would you please provide an example of where an unsuspecting server is
> tricked into doing something?

Sure.

Suppose we had no handshake at all, and that there was no data framing, so
that as soon as we connected to a port, we could send arbitrary data down.

A Web page, say evil.example.net, could open a Web Socket connection to
http://www.corp.example.com/, send it a GET request for /secret-plans, and
then forward the contents of the file to a remote host. If they could then
trick someone on example.com's intranet to look at this file, and assuming
www.corp.example.com did nothing more than rely on connectivitity for
authentication (pretty common in small intranets), then evil.example.net
could steal the company's secret plans.

Now, Web Socket has a multi-layered approach to dealing with this.

- there is the handshake, which requires that the server respond with a
very specific set of bytes, thus guaranteeing that the server is in fact
WebSocket-aware. Any "wildcard" part to this handshake increases the risk
that there will be a server somewhere that can be tricked. For example, if
the handshake were "HTTP" followed by anything followed by "WebSocket",
then some HTTP servers could be tricked into doing the handshake -- for
example, the response to "GET /WebSocket" on the ietf.org host (not
www.ietf.org) includes the word WebSocket in the response.

HTTP servers aren't the only concern, of course; we want the handshake to
be as secure as possible against any other protocol that may exist on any
server that may be deployed. We don't know what's out there (especially in
intranets), so the handshake has to be pretty stringent to make it as
unlikely as possible.

- there is origin checking and location checking, and the location
checking isn't just an echo of the original request's data. This makes
causing the server to send back particular data harder. (It's also part of
our cross-origin security model and our shared hosting support; but that's
a separate discussion.)

- there is the framing, which does provide a modicum of protection by
forcing another byte in front of the first author-controlled packet sent.
(This isn't really a security feature, it's just a lucky accident of the
framing that we needed to turn TCP streams into packets.)

> >> Ian, don't you see and understand the semantic difference between
> >> "speaking HTTP" and "speaking a magic bytecode that is intended to
> >> look HTTP-enough to fool a bunch of things until the upgrade process
> >> occurs" ? Don't you understand that the possible set of things that
> >> can go wrong here is quite unbounded ? Don't you understand the whole
> >> reason for "known ports" and protocol descriptions in the first
> >> place?
> >
> > Apparently not.
>
> Ok. Look at this.
>
> The byte sequence "GET / HTTP/1.0\r\nHost: foo\r\nConnection:
> close\r\n\r\n" is not byte equivalent to the sequence "GET /
> HTTP/1.0\r\nConnection: close\r\nHost: foo\r\n\r\n"
>
> The same byte sequence interpreted as a HTTP protocol exchange is
> equivalent.

Yes, but the client is a WebSocket client, not an HTTP client, so why
would it send anything but the WebSocket handshake?

The only case I can see where the handshake gets changed is MITM proxies,
but as far as I understand it, there's no way to ever get a reliable
bidirectional non-HTTP TCP/IP connection through a Squid MITM proxy over
port 80 to a remote server that normally acts like an HTTP server, so it
doesn't matter anyway, since whatever we do, it won't work on that port.

> There's a mostly-expected understanding that what happens over port 80
> is HTTP. The few cases where that has broken (specifically Shoutcast,
> but I do see other crap on port 80 from time to time..) has been by
> people who have implemented a mostly HTTP looking protocol, tested that
> it mostly works via a few gateways/firewalls/proxies, and then deployed
> it.

Is there a way to get a reliable bidirectional non-HTTP TCP/IP connection
through a Squid MITM proxy over port 80 to a remote server that normally
acts like an HTTP server?

If not, then sending any data over port 80 on such a network wouldn't
work, right? So as far as I can tell, port 80 in such a scenario isn't
relevant. Authors in such scenarios would use port 443, like Mark said.

> You're intending to do stuff over tcp/80 which looks like HTTP but isn't
> HTTP.

No. I'm intending to do stuff over port 81. There is a desire in certain
cases to be able to share this traffic with servers running on port 80,
and in those cases, the content sent and the content required to be
returned by the server is valid HTTP until the Upgrade succeeds, at which
point it isn't HTTP.

> Everyone who implements anything HTTP gateway related (be it a
> transparent proxy, a firewall, a HTTP "router", etc) suddenly may have
> to implement your websockets stuff as well. So all of a sudden your
> attempt to not extend HTTP ends up extending HTTP.

Certainly I'm not going to stop people from making their software
compatible with WebSockets, but I don't see why they'd be required to do
so. The protocol is intentionally designed so that it can be tunnelled
(over port 443 with TLS) in a way that sidesteps all of that.

> >> The point is, there may be a whole lot of stuff going on with HTTP
> >> implementations that you're not aware of.
> >
> > Sure, but with the except of man-in-the-middle proxies, this isn't a
> > big deal -- the people implementing the server side are in control of
> > what the HTTP implementation is doing.
>
> That may be your understanding of how the world works, but out here in
> the rest of the world, the people who deploy the edge and the people who
> deploy the core may not be the same people. There may be a dozen layers
> of red tape, equipment lifecycle, security features, etc, that need to
> be handled before "websockets happy" stuff can be deployed everywhere it
> needs to.

If there is an edge and a core, then there is more than one server, and
there's no need to share a port with an HTTP server.

> Please don't discount man-in-the-middle -anything- as being "easy" to
> deal with.

I'm not. In fact I'm saying that it's impossible to deal with, and that
people should side-step the entire issue and use a dedicated server with
no sharing with HTTP at all, if they have a setup like you describe.

> > In all cases except a man-in-the-middle proxy, this seems to be what
> > we do. I'm not sure how we can do anything in the case of such a
> > proxy, since by definition the client doesn't know it is present.
>
> .. so you're still not speaking HTTP?

I don't understand the relevance of your question. It doesn't matter what
you're talking; in the case of a MITM proxy, the client doesn't know it
isn't talking straight to the server. Talking HTTP more than we already do
doesn't suddenly mean we can get a two-way pipe set up.

> Ian, are you absolutely certain that everywhere you use "the internet",
> there is no "man in the middle" between you and the server you're
> speaking to?

In the case of TLS connections, yes.

> Haven't you ever worked at any form of corporate or enterprise
> environment? What about existing "captive portal" deployments like wifi
> hotspots, some of which still use squid-2.5 (eww!) as their http
> firewall/proxy to control access to the internet? That stuff is going to
> need upgrading sure, but I'd rather see the upgrade happen once to a
> well thought out and reasonably well designed protocol, versus having
> lots of little upgrades need to occur because your "HTTP but not quite
> HTTP" exchange on port 80 isn't thought out enough.

I really don't understand the problem you are alluding to here. What needs
upgrading in such a scenario?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Fri Jul 17 2009 - 10:01:05 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 30 2009 - 12:00:09 MDT