介绍

很多直播或对数据及时性要求比较高的网站,使用了WebSocket。这种数据要怎么抓呢?

我们这里以socket.io为例,我们可以查看网站网页源代码看使用的H5的WebSocket还是socket.io等JS库。

这里以java语言为例说明。假定网站使用的是socket.io库来实现消息推送。我们如何通过java来获取服务端推送的信息呢?

socket.io提供了java的客户端实现socket.io-client。所以获取服务端推送的数据,本质是作为一个客户端连接上WebSocket server。

连接websocket server接收推送的数据

首先添加依赖

1
2
3
4
5
<dependency>
  <groupId>io.socket</groupId>
  <artifactId>socket.io-client</artifactId>
  <version>1.0.0</version>
</dependency>

ws地址的java代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
final Socket socket = IO.socket("http://localhost:9098/");
socket.on(Socket.EVENT_CONNECTING, new Emitter.Listener() {
        @Override
        public void call(Object... objects) {
            System.out.println("WebSocket连接中");
        }
    }).on(Socket.EVENT_CONNECT, new Emitter.Listener() {
        @Override
        public void call(Object... objects) {
            System.out.println("WebSocket连接成功!");
        }
    }).on("OnMSG", new Emitter.Listener() { // 这里指定要接收的事件名称
        @Override
        public void call(Object... objects) {
            JSONObject object = (JSONObject) objects[0];
            System.out.println("OnMSG:\n"+object.toString());
        }
    }).on(Socket.EVENT_DISCONNECT, new Emitter.Listener() {
        @Override
        public void call(Object... objects) {
            System.out.println("WebSocket连接关闭!");
        }
    });
// 连接websocket server
socket.connect();
// 休眠3S后发送一条信息给websocket server
Thread.sleep(3000L);
if (socket.connected()) {
    JSONObject msg = new JSONObject();
    msg.put("from","jack");
    msg.put("to","admin");
    msg.put("content","Hello,I am jack!");
    socket.emit("OnMSG", msg);
}

如果websocket地址是ws://localhost:9098,则在socket.io-java中地址写http://localhost:9098;如果是wss://localhost:9098,则写为https://localhost:9098.

服务端的例子参考:socketio推送技术

wss地址的java代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
public HostnameVerifier getMyHostnameVerifier() {
    return new HostnameVerifier() {
        @Override
        public boolean verify(String s, SSLSession sslSession) {
            return true;
        }
    };
}
public X509TrustManager getTrustManager() {
    return new X509TrustManager() {
        @Override
        public java.security.cert.X509Certificate[] getAcceptedIssuers() {
            return new java.security.cert.X509Certificate[] {};
        }
        @Override
        public void checkClientTrusted(X509Certificate[] certs, String authType) {
        }
        @Override
        public void checkServerTrusted(X509Certificate[] certs, String authType) {
        }
    };
}
public SSLContext createSSLContext(X509TrustManager trustManager) throws GeneralSecurityException, IOException {
//        Security.insertProviderAt(new BouncyCastleProvider(), 1);
    try {
        TrustManager[] trustAllCerts = new TrustManager[]{trustManager};
        // Install the all-trusting trust manager
        SSLContext sc = SSLContext.getInstance("SSL");
        sc.init(null, trustAllCerts, new java.security.SecureRandom());
        return sc;
    } catch (Exception exception) {
        exception.printStackTrace();
    }
    return null;
}
X509TrustManager trustManager = getTrustManager();
SSLContext sslContext = createSSLContext(trustManager);
HostnameVerifier myHostnameVerifier = getMyHostnameVerifier();
OkHttpClient okHttpClient = new OkHttpClient.Builder()
        .hostnameVerifier(myHostnameVerifier)
        .sslSocketFactory(sslContext.getSocketFactory(), trustManager)
        .build();
// default settings for all sockets
IO.setDefaultOkHttpWebSocketFactory(okHttpClient);
IO.setDefaultOkHttpCallFactory(okHttpClient);
// set as an option
IO.Options opts = new IO.Options();
opts.callFactory = okHttpClient;
opts.webSocketFactory = okHttpClient;
opts.port = 443;
opts.secure = true;
final Socket socket = IO.socket("http://localhost:9098/", opts);
...
其他部分的代码参考ws部分

在浏览器中,在websocket请求的Frames部分是发送和接收到消息。如图:

向上箭头(绿色)表示向websocket server发送的消息,向下箭头(红色)表示接收的消息。

注意

有些网站可能对url上加了token或其他参数验证,防止不可靠的连接,一般比如会员登录后,生成token并发起websocket连接。这就需要自己处理token。