解决NodeJS压缩库Compression在Express中不生效的问题

近期在优化博客的访问速度，其中最重要的一个能力便是压缩传输时的内容。笔者随便在网络上搜索了一个使用量很高的NodeJS压缩库：compression，使用后最终结果却并没有预期中那样对传输内容进行压缩，本文记录了无法启用压缩的排查过程。

NodeJS库compression的使用

根据官方的使用帮助^参考1，我们写出的Demo如下：

import express from 'express';
import compression from 'compression';
const app = express();
app.use(compression());
app.get(
'*',
(request, response) => {
response.end('<html><body>' + 'Hello World!'.repeat(1000) + '</body></html>');
}
);
app.listen(
5000,
'0.0.0.0',
() => console.log(`listening on port 5000 ...`)
)

预期请求时带上Accept-Encoding头，Express返回内容的Content-Encoding会是我们Accept-Encoding头中的方式——然而现实情况并不是，请看如下请求：

>> curl -v http://127.0.0.1:5000/ -H 'Accept-Encoding: gzip'
* Trying 127.0.0.1:5000...
* Connected to 127.0.0.1 (127.0.0.1) port 5000
* using HTTP/1.x
> GET / HTTP/1.1
> Host: 127.0.0.1:5000
> User-Agent: curl/8.12.1
> Accept: */*
> Accept-Encoding: gzip
>
* Request completely sent off
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Date: Thu, 27 Mar 2025 03:48:20 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Transfer-Encoding: chunked
<
* Connection #0 to host 127.0.0.1 left intact
<html><body>Hello World!……</body></html>

可见，响应并没有压缩。然而我们的使用方式明明是根据官方文档来的，为什么会出现不符合实际预期的问题呢？

在源码中寻找原因

还好compression本身是开源的，通过浏览它的源码，我们发现存在一个叫做shouldCompress的函数：

从函数名来看，此函数的作用是判断是否需要压缩的，而其判断的条件是根据Content-Type来决定的。

这里涉及到一个Express的小知识^参考2，Express的end调用并不会在响应中自动追加一些HTTP头，这也就直接导致shouldCompress函数根据Content-Type头取到的type为undefined，进而导致该函数判断响应内容不需要压缩。

到这里，解决方案就很明确了，新增对应的HTTP头或者使用会自动增加HTTP头的函数调用（例如send方法会自动添加该头）。这里我们采用增加HTTP头的方式：

app.get(
'*',
(request, response) => {
response.type('text/html');
response.end(...);
}
);

注意：response.end中响应内容长度要足够，长度太小也会导致内容不会被压缩（因为压缩后反而比原文还要长）。

此时我们再来看返回内容：

>> curl -v http://127.0.0.1:5000/ -H 'Accept-Encoding: gzip'
* Trying 127.0.0.1:5000...
* Connected to 127.0.0.1 (127.0.0.1) port 5000
* using HTTP/1.x
> GET / HTTP/1.1
> Host: 127.0.0.1:5000
> User-Agent: curl/8.12.1
> Accept: */*
> Accept-Encoding: gzip
>
* Request completely sent off
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Content-Type: text/html; charset=utf-8
< Vary: Accept-Encoding
< Content-Encoding: gzip
< Date: Thu, 27 Mar 2025 04:20:48 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Transfer-Encoding: chunked
<
Warning: Binary output can mess up your terminal. Use "--output -" to tell curl to output it to your terminal anyway, or consider "--output <FILE>" to save to a file.
* client returned ERROR on write of 10 bytes
* Failed reading the chunked-encoded stream
* closing connection #0

可见输出结果已经提示是Binary了，而且HTTP头中也有了对应的Content-Encoding。

然而随着调试的进行，笔者发现依然有部分内容不会被压缩。

其他不被压缩的情况

首先来看下如下代码：

import fs from 'fs';
import express from 'express';
import compression from 'compression';
const app = express();
app.use(compression());
app.get(
'*',
(request, response) => {
response.type('image/jpeg');
response.end(fs.readFileSync('./example.jpeg'));
}
);
app.listen(
5000,
'0.0.0.0',
() => console.log(`listening on port 5000 ...`)
)

其实和本文一开始的代码没什么区别，只是将返回的内容变更为了一张图片（二进制数据），且这次我们使用type函数在返回的HTTP头中标识了内容为'image/jpeg'类型。

那么按照上文的分析，此内容应该被压缩，然而事实却是该部分内容并没有被压缩：

>> curl -v http://127.0.0.1:5000/ -H 'Accept-Encoding: gzip' | hexdump -C | head -3
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 127.0.0.1:5000...
* Connected to 127.0.0.1 (127.0.0.1) port 5000
* using HTTP/1.x
> GET / HTTP/1.1
> Host: 127.0.0.1:5000
> User-Agent: curl/8.12.1
> Accept: */*
> Accept-Encoding: gzip
>
* Request completely sent off
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Content-Type: image/jpeg
< Date: Thu, 27 Mar 2025 04:36:08 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
< Transfer-Encoding: chunked
<
{ [32588 bytes data]
100 67275 0 67275 0 0 43.1M 0 --:--:-- --:--:-- --:--:-- 64.1M
* Connection #0 to host 127.0.0.1 left intact
00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 00 48 |......JFIF.....H|
00000010 00 48 00 00 ff db 00 43 00 03 02 02 03 02 02 03 |.H.....C........|
00000020 03 03 03 04 03 03 04 05 08 05 05 04 04 05 0a 07 |................|

由上述响应可见，并没有Content-Type头，且响应内容直接为一张jpeg的图片。

是什么原因导致此内容不会被压缩？看来我们需要继续在compression的源码中寻找答案。

重返源码

不知道大家有没有注意到，在上文中的shouldCompress函数中，还有一个不太起眼的函数调用：

function shouldCompress (req, res) {
var type = res.getHeader('Content-Type')
if (type === undefined || !compressible(type)) {
debug('%s not compressible', type)
return false
}
return true
}

那么，这个compressible函数中做了什么呢？

var db = require('mime-db')
function compressible (type) {
if (!type || typeof type !== 'string') {
return false
}
// strip parameters
var match = EXTRACT_TYPE_REGEXP.exec(type)
var mime = match && match[1].toLowerCase()
var data = db[mime]
// return database information
if (data && data.compressible !== undefined) {
return data.compressible
}
// fallback to regexp or unknown
return COMPRESSIBLE_TYPE_REGEXP.test(mime) || undefined
}

可见，其实是根据得到的内容类型，查询mime-db库，mime-db库其实就是一个json文件，里面记录了各种mime文件类型的信息，比如本次涉及到的jpeg文件的信息如下：

{
....
image/jpeg: {
source: iana,
compressible: false,
extensions: [jpeg,jpg,jpe]
},
....
}

compressible为false——原来jpeg是不可压缩的。

通过查阅wiki可知jpeg格式是一种lossy compression的图片格式^参考3——也就是说这种格式的图片已经被压缩过了，因而再次压缩也没有什么意义。

所以说，HTTP传输的文件并不是所有的都是可压缩的，不要看到Content-Type中没有压缩选项就认为压缩有问题。

最终效果

下图可见一些几十kB的js文件，经过压缩后传输实际上仅需要几kB，这在网站访问量很大的前提下可以帮助显著降低带宽的压力。

好了，又水了一篇，表示很开心；）

Watch & Learn

Debugwar Blog

Step in or Step over, this is a problem ...

NodeJS库compression的使用

在源码中寻找原因

其他不被压缩的情况

重返源码

最终效果

参考

目录