Author Avatar
Jafir Mar 01, 2024

前言

作者目前打算分享一期关于devOps系列的文章,希望对热爱学习和探索的你有所帮助。

文章主要记录一些简洁、高效的运维部署指令,旨在 记录和能够快速地构建系统。就像运维文档或者手册一样,方便进行系统的重建、改造和优化。每篇文章独立出来,可以单独作为其中一项组件的部署和使用。

本章为 devOps系列(八)efk+prometheus+grafana日志监控和告警

大纲

devOps系列介绍

devOps系列(一)docker搭建

devOps系列(二)gitlab搭建

devOps系列(三)nexus-harbor搭建

devOps系列(四)jenkins搭建

devOps系列(五)efk系统搭建

devOps系列(六)grafana+prometheus搭建

devOps系列(七)grafana+prometheus监控告警

devOps系列(八)efk+prometheus+grafana日志监控和告警

正文

日志收集

目前我们已经搭建好了efk日志系统,接下来就是把日志数据采集进来。

目前java程序的采集,可以在框架侧写一个基于logback日志收集starter依赖框架,便于日志收集的安装和管理。

可以自建一个starter依赖工程项目,也可以直接植入项目工程。

注:本文着重介绍核心原理,可能无法直接使用

需要引入的依赖

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<dependency>
<groupId>com.sndyuk</groupId>
<artifactId>logback-more-appenders</artifactId>
</dependency>

<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</dependency>
<dependency>
<groupId>org.komamitsu</groupId>
<artifactId>fluency-core</artifactId>
</dependency>
<dependency>
<groupId>org.komamitsu</groupId>
<artifactId>fluency-fluentd</artifactId>
</dependency>

核心logback配置文件 logback-spring.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
<?xml version="1.0" encoding="UTF-8"?>
<configuration debug="false">
<!--定义日志文件的存储地址 勿在 LogBack 的配置中使用相对路径-->
<springProperty name="profile" source="spring.profiles.active"/>
<springProperty name="applicationName" source="spring.application.name"/>
<!-- 默认地址 -->
<springProperty name="fluentdAddr" source="framework.logback.fluentd-addr" defaultValue="fluentd.jafir.top"/>
<property name="LOG_HOME" value="/${applicationName}/logs"/>
<!-- 控制台输出 -->
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<!--格式化输出:%d表示日期,%thread表示线程名,%-5level:级别从左显示5个字符宽度%msg:日志消息,%n是换行符-->
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n</pattern>
</encoder>
</appender>
<!-- info及其以上日志 -->
<appender name="LOCAL_ALL" class="ch.qos.logback.core.rolling.RollingFileAppender">
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<!--日志文件输出的文件名-->
<FileNamePattern>${LOG_HOME}/info_log.%d{yyyy-MM-dd}.log</FileNamePattern>
<!--日志文件保留天数-->
<MaxHistory>30</MaxHistory>
</rollingPolicy>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<!--格式化输出:%d表示日期,%thread表示线程名,%-5level:级别从左显示5个字符宽度%msg:日志消息,%n是换行符-->
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n</pattern>
<!-- 设置编码格式,以防中文乱码 -->
<charset class="java.nio.charset.Charset">UTF-8</charset>
</encoder>
<!--日志文件最大的大小-->
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<MaxFileSize>10MB</MaxFileSize>
</triggeringPolicy>
</appender>

<!-- 错误日志 -->
<appender name="LOCAL_ERROR" class="ch.qos.logback.core.rolling.RollingFileAppender">
<filter class="ch.qos.logback.classic.filter.LevelFilter">
<level>ERROR</level>
<onMatch>ACCEPT</onMatch>
<onMismatch>DENY</onMismatch>
</filter>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<!--日志文件输出的文件名-->
<FileNamePattern>${LOG_HOME}/error_log.%d{yyyy-MM-dd}.log</FileNamePattern>
<!--日志文件保留天数-->
<MaxHistory>30</MaxHistory>
</rollingPolicy>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<!--格式化输出:%d表示日期,%thread表示线程名,%-5level:级别从左显示5个字符宽度%msg:日志消息,%n是换行符-->
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n</pattern>
<!-- 设置编码格式,以防中文乱码 -->
<charset class="java.nio.charset.Charset">UTF-8</charset>
</encoder>
<!--日志文件最大的大小-->
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<MaxFileSize>10MB</MaxFileSize>
</triggeringPolicy>
</appender>

<!-- Fluency -->
<appender name="FLUENCY_SYNC" class="ch.qos.logback.more.appenders.FluencyLogbackAppender">
<!-- Tag for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file -->
<!-- 微服务名 -->
<tag>${applicationName}</tag>
<!-- [Optional] Label for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file -->

<!-- Host name/address and port number which Fluentd placed -->
<remoteHost>${fluentdAddr}</remoteHost>
<port>24224</port>

<!-- [Optional] Multiple name/addresses and port numbers which Fluentd placed
<remoteServers>
<remoteServer>
<host>primary</host>
<port>24224</port>
</remoteServer>
<remoteServer>
<host>secondary</host>
<port>24224</port>
</remoteServer>
</remoteServers>
-->

<!-- [Optional] Additional fields(Pairs of key: value) -->
<!-- 环境 -->
<additionalField>
<key>env</key>
<value>${profile}</value>
</additionalField>

<!-- [Optional] Configurations to customize Fluency's behavior: https://github.com/komamitsu/fluency#usage -->
<ackResponseMode>false</ackResponseMode>
<!-- <fileBackupDir>/tmp</fileBackupDir> -->
<bufferChunkInitialSize>33554432</bufferChunkInitialSize>
<bufferChunkRetentionSize>268435456</bufferChunkRetentionSize>
<maxBufferSize>1073741824</maxBufferSize>
<bufferChunkRetentionTimeMillis>1000</bufferChunkRetentionTimeMillis>
<connectionTimeoutMilli>5000</connectionTimeoutMilli>
<readTimeoutMilli>5000</readTimeoutMilli>
<waitUntilBufferFlushed>30</waitUntilBufferFlushed>
<waitUntilFlusherTerminated>40</waitUntilFlusherTerminated>
<flushAttemptIntervalMillis>200</flushAttemptIntervalMillis>
<senderMaxRetryCount>12</senderMaxRetryCount>
<!-- [Optional] Enable/Disable use of EventTime to get sub second resolution of log event date-time -->
<useEventTime>true</useEventTime>
<sslEnabled>false</sslEnabled>
<!-- [Optional] Enable/Disable use the of JVM Heap for buffering -->
<jvmHeapBufferMode>false</jvmHeapBufferMode>
<!-- [Optional] If true, Map Marker is expanded instead of nesting in the marker name -->
<flattenMapMarker>false</flattenMapMarker>
<!-- [Optional] default "marker" -->
<markerPrefix></markerPrefix>

<!-- [Optional] Message encoder if you want to customize message -->
<encoder>
<pattern><![CDATA[%-5level %logger{50}#%line %message]]></pattern>
</encoder>

<!-- [Optional] Message field key name. Default: "message" -->
<messageFieldKeyName>msg</messageFieldKeyName>

</appender>

<!-- Fluency -->
<appender name="FLUENCY_SYNC_ACCESS" class="ch.qos.logback.more.appenders.FluencyLogbackAppender">
<!-- Tag for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file -->
<!-- 微服务名 -->
<tag>access-${applicationName}</tag>
<!-- [Optional] Label for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file -->

<!-- Host name/address and port number which Fluentd placed -->
<remoteHost>${fluentdAddr}</remoteHost>
<port>24224</port>

<!-- [Optional] Multiple name/addresses and port numbers which Fluentd placed
<remoteServers>
<remoteServer>
<host>primary</host>
<port>24224</port>
</remoteServer>
<remoteServer>
<host>secondary</host>
<port>24224</port>
</remoteServer>
</remoteServers>
-->

<!-- [Optional] Additional fields(Pairs of key: value) -->
<!-- 环境 -->
<additionalField>
<key>env</key>
<value>${profile}</value>
</additionalField>

<!-- [Optional] Configurations to customize Fluency's behavior: https://github.com/komamitsu/fluency#usage -->
<ackResponseMode>false</ackResponseMode>
<!-- <fileBackupDir>/tmp</fileBackupDir> -->
<bufferChunkInitialSize>33554432</bufferChunkInitialSize>
<bufferChunkRetentionSize>268435456</bufferChunkRetentionSize>
<maxBufferSize>1073741824</maxBufferSize>
<bufferChunkRetentionTimeMillis>1000</bufferChunkRetentionTimeMillis>
<connectionTimeoutMilli>5000</connectionTimeoutMilli>
<readTimeoutMilli>5000</readTimeoutMilli>
<waitUntilBufferFlushed>30</waitUntilBufferFlushed>
<waitUntilFlusherTerminated>40</waitUntilFlusherTerminated>
<flushAttemptIntervalMillis>200</flushAttemptIntervalMillis>
<senderMaxRetryCount>12</senderMaxRetryCount>
<!-- [Optional] Enable/Disable use of EventTime to get sub second resolution of log event date-time -->
<useEventTime>true</useEventTime>
<sslEnabled>false</sslEnabled>
<!-- [Optional] Enable/Disable use the of JVM Heap for buffering -->
<jvmHeapBufferMode>false</jvmHeapBufferMode>
<!-- [Optional] If true, Map Marker is expanded instead of nesting in the marker name -->
<flattenMapMarker>false</flattenMapMarker>
<!-- [Optional] default "marker" -->
<markerPrefix></markerPrefix>

<!-- [Optional] Message encoder if you want to customize message -->
<encoder>
<pattern>%message%n</pattern>
</encoder>

<!-- [Optional] Message field key name. Default: "message" -->
<messageFieldKeyName>msg</messageFieldKeyName>

</appender>

<appender name="FLUENCY" class="ch.qos.logback.classic.AsyncAppender">
<!-- Max queue size of logs which is waiting to be sent (When it reach to the max size, the log will be disappeared). -->
<queueSize>999</queueSize>
<!-- Never block when the queue becomes full. -->
<neverBlock>true</neverBlock>
<!-- The default maximum queue flush time allowed during appender stop.
If the worker takes longer than this time it will exit, discarding any remaining items in the queue.
10000 millis
-->
<maxFlushTime>1000</maxFlushTime>
<appender-ref ref="FLUENCY_SYNC"/>
</appender>

<appender name="FLUENCY_ACCESS" class="ch.qos.logback.classic.AsyncAppender">
<!-- Max queue size of logs which is waiting to be sent (When it reach to the max size, the log will be disappeared). -->
<queueSize>999</queueSize>
<!-- Never block when the queue becomes full. -->
<neverBlock>true</neverBlock>
<!-- The default maximum queue flush time allowed during appender stop.
If the worker takes longer than this time it will exit, discarding any remaining items in the queue.
10000 millis
-->
<maxFlushTime>1000</maxFlushTime>
<appender-ref ref="FLUENCY_SYNC_ACCESS"/>
</appender>

<springProfile name="local">
<!-- 日志输出级别 -->
<root level="INFO">
<appender-ref ref="STDOUT"/>
<appender-ref ref="FLUENCY"/>
<!-- <appender-ref ref="LOCAL_ALL"/>-->
<!-- <appender-ref ref="LOCAL_ERROR"/>-->
<!-- <appender-ref ref="FLUENCY"/>-->
</root>
<logger name="com.jafir.logback.aop.WebLogAspect" level="INFO" additivity="false">
<appender-ref ref="STDOUT"/>
<!-- <appender-ref ref="FLUENCY_ACCESS"/>-->
</logger>
</springProfile>

<springProfile name="dev,test,preprod">
<!-- 日志输出级别 -->
<root level="INFO">
<appender-ref ref="STDOUT"/>
<!-- <appender-ref ref="LOCAL_ALL"/>-->
<!-- <appender-ref ref="LOCAL_ERROR"/>-->
<appender-ref ref="FLUENCY"/>
</root>
<logger name="com.jafir.logback.aop.WebLogAspect" level="INFO" additivity="false">
<appender-ref ref="STDOUT"/>
<appender-ref ref="FLUENCY_ACCESS"/>
</logger>
</springProfile>

<springProfile name="prod">
<!-- 日志输出级别 -->
<root level="INFO">
<appender-ref ref="STDOUT"/>
<!-- <appender-ref ref="LOCAL_ALL"/>-->
<!-- <appender-ref ref="LOCAL_ERROR"/>-->
<appender-ref ref="FLUENCY"/>
</root>
<logger name="com.jafir.logback.aop.WebLogAspect" level="INFO" additivity="false">
<appender-ref ref="STDOUT"/>
<appender-ref ref="FLUENCY_ACCESS"/>
</logger>
</springProfile>

<!-- 关闭某个日志打印 -->
<logger name="org.komamitsu.fluency.Fluency" level="OFF" />
<logger name="org.komamitsu.fluency.fluentd.ingester.sender.RetryableSender" level="OFF" />
<logger name="org.komamitsu.fluency.fluentd.ingester.sender.NetworkSender" level="OFF" />

</configuration>

springBoot的AutoConfiguration类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
package com.jafir.logback;

import com.jafir.logback.aop.WebLogAspect;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.boot.context.properties.EnableConfigurationProperties;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Import;


@Configuration
@Import(WebLogAspect.class)
@EnableConfigurationProperties(LogbackProperties.class)
@ConditionalOnProperty(prefix = "framework.logback", value = "enabled", havingValue = "true", matchIfMissing = true)
@ComponentScan(value = "com.jafir.logback")
public class LogbackAutoConfiguration {
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package com.jafir.logback;

import org.springframework.boot.context.properties.ConfigurationProperties;

import java.util.List;

@ConfigurationProperties("framework.logback")
public class LogbackProperties {
private Boolean enabled = false;
private String fluentdAddr = "fluentd.jaifr.top";
private List<String> excludeUrl;

public List<String> getExcludeUrl() {
return excludeUrl;
}

public void setExcludeUrl(List<String> excludeUrl) {
this.excludeUrl = excludeUrl;
}

public Boolean getEnabled() {
return enabled;
}

public void setEnabled(Boolean enabled) {
this.enabled = enabled;
}

public String getFluentdAddr() {
return fluentdAddr;
}

public void setFluentdAddr(String fluentdAddr) {
this.fluentdAddr = fluentdAddr;
}
}

可以通过yml配置文件来进行装配控制

framework.logback.enabled 控制是否开启日志收集

framework.logback.fluentdAddr 设置fluentd的地址

framework.logback.excludeUrl 设置过滤不进行收集的地址

核心servlet拦截器类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
package com.jafir.logback.aop;

import cn.hutool.core.collection.CollUtil;
import cn.hutool.http.HttpStatus;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.gson.Gson;
import com.jafir.logback.LogResponseBody;
import com.jafir.logback.LogbackProperties;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.io.IOUtils;
import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.springframework.web.servlet.HandlerMapping;
import org.springframework.web.util.ContentCachingResponseWrapper;

import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import java.time.Instant;
import java.util.*;

@Aspect
@Slf4j
public class WebLogAspect {
private final boolean NEED_RESPONSE_BODY = true;


private final LogbackProperties logbackProperties;
private final ObjectMapper objectMapper;


private static final List<String> DEFAULT_EXCLUDE_URL = new ArrayList<>();

static {
DEFAULT_EXCLUDE_URL.add("/actuator/prometheus");
DEFAULT_EXCLUDE_URL.add("/health/detect");
}

public WebLogAspect(ObjectMapper objectMapper, LogbackProperties logbackProperties) {
this.objectMapper = objectMapper;
this.logbackProperties = logbackProperties;

if (CollUtil.isNotEmpty(logbackProperties.getExcludeUrl())) {
logbackProperties.getExcludeUrl().addAll(DEFAULT_EXCLUDE_URL);
} else {
logbackProperties.setExcludeUrl(DEFAULT_EXCLUDE_URL);
}
}

@Around("execution(public void javax.servlet.http.HttpServlet.service(..)))")
public Object webLog(ProceedingJoinPoint joinPoint) throws Throwable {
Object[] args = joinPoint.getArgs();
DelegateHttpRequest servletRequest = new DelegateHttpRequest((HttpServletRequest) args[0]);
HttpServletResponse servletResponse = (HttpServletResponse) args[1];
ContentCachingResponseWrapper responseWrapper = null;
if (NEED_RESPONSE_BODY) {
responseWrapper = new ContentCachingResponseWrapper(servletResponse);
}


// 过滤不需要拦截的请求
if (doNotIntercept(servletRequest)) {
return joinPoint.proceed();
}


WebLog webLog = new WebLog();
webLog.setTimestamp(Instant.now());

WebLog.Request request = new WebLog.Request();

InputStream servletRequestStream = servletRequest.getInputStream();

int size;
byte[] buffer = new byte[1024];
ByteArrayOutputStream tmpRequestStream = new ByteArrayOutputStream();
while ((size = servletRequestStream.read(buffer)) != -1) {
tmpRequestStream.write(buffer, 0, size);
}

request.setBody(tmpRequestStream.toString());

Map<String, List<String>> requestHeaders = new HashMap<>();

Enumeration<String> servletRequestHeaders = servletRequest.getHeaderNames();
while (servletRequestHeaders.hasMoreElements()) {
String header = servletRequestHeaders.nextElement();
Enumeration<String> values = servletRequest.getHeaders(header);
List<String> list = new ArrayList<>();
while (values.hasMoreElements()) {
String value = values.nextElement();
list.add(value);
}
requestHeaders.put(header, list);
}
request.setHeaders(requestHeaders);

request.setMethod(servletRequest.getMethod());

Object rawUrl = servletRequest.getAttribute("raw-api-uri");
if (rawUrl instanceof String) {
request.setRequestUri((String) rawUrl);
} else {
request.setRequestUri(servletRequest.getRequestURI());
}

Map<String, List<String>> parameters = new HashMap<>();
for (Map.Entry<String, String[]> entry : servletRequest.getParameterMap().entrySet()) {
List<String> list = new ArrayList<>(Arrays.asList(entry.getValue()));
parameters.put(entry.getKey(), list);
}
request.setParameters(parameters);

Object attributeStart = servletRequest.getAttribute("raw-api-start");
long start;

if (attributeStart instanceof Long) {
start = (long) attributeStart;
} else {
start = System.nanoTime();
}

Object value;
try {
if (NEED_RESPONSE_BODY) {
value = joinPoint.proceed(new Object[]{servletRequest, responseWrapper});
} else {
value = joinPoint.proceed(new Object[]{servletRequest, servletResponse});
}
} catch (Throwable e) {
((HttpServletRequest) args[0]).setAttribute("raw-api-uri", servletRequest.getRequestURI());
((HttpServletRequest) args[0]).setAttribute("raw-api-start", start);
throw e;
}

@SuppressWarnings("unchecked")
Map<String, String> pathMap = (Map<String, String>) servletRequest.getAttribute(HandlerMapping.URI_TEMPLATE_VARIABLES_ATTRIBUTE);

if (pathMap != null && !pathMap.isEmpty()) {
//写入path数据
request.setPathParameters(pathMap);
}

long timeTaken = (System.nanoTime() - start) / 1_000_000;


WebLog.Response response = new WebLog.Response();
int status = servletResponse.getStatus();
if (NEED_RESPONSE_BODY) {
boolean isSuccess = true;
// 成功的接口不用记录 responseBody
if (HttpStatus.HTTP_OK != status) {
isSuccess = false;
}
String responseBodyStr;
try {
responseBodyStr = IOUtils.toString(responseWrapper.getContentInputStream(), StandardCharsets.UTF_8.displayName());
} catch (Exception e) {
responseBodyStr = "";
isSuccess = false;
log.error("接口: {} ,IOUtils.toString 出现异常", servletRequest.getRequestURI());
}

// 失败的记录一下body
if (!isSuccess) {
response.setResponseBody(responseBodyStr);
}
try {
responseWrapper.copyBodyToResponse();
} catch (Exception e) {
log.error("接口: {} ,copyBodyToResponse 出现异常", servletRequest.getRequestURI());
}
}
response.setStatus(status);
Map<String, List<String>> responseHeaders = new HashMap<>();

Collection<String> servletResponseHeaders = servletResponse.getHeaderNames();
for (String headerName : servletResponseHeaders) {
Collection<String> values = servletResponse.getHeaders(headerName);
List<String> list = new ArrayList<>(values);
responseHeaders.put(headerName, list);
}
response.setHeaders(responseHeaders);

String bestUri = String.valueOf(servletRequest.getRequest().getAttribute(HandlerMapping.BEST_MATCHING_PATTERN_ATTRIBUTE));
//兼容处理 抛异常情况下该值null 用requestUri兼容
if(bestUri!=null && !bestUri.isEmpty() && !"null".equals(bestUri)) {
request.setUri(bestUri);
}else {
request.setUri(request.getRequestUri());
}

webLog.setTimeTaken(timeTaken);
webLog.setRequest(request);
webLog.setResponse(response);

log.info(objectMapper.writeValueAsString(webLog));

return value;
}


private boolean doNotIntercept(DelegateHttpRequest servletRequest) {

// 放行文件类型
if (servletRequest.getContentType() != null && servletRequest.getContentType().contains("multipart")) {
return true;
}

// 如果是 post的 x-www-form 类型 也就是 xx=xx&xx=xx&xx 这种格式的 (一般很少有这样使用的)
if ((servletRequest.getContentType() != null
&& servletRequest.getContentType().contains("application/x-www-form-urlencoded")
&& "post".equalsIgnoreCase(servletRequest.getMethod()))) {
return true;
}

// 放行不做拦截的uri
for (String uri : logbackProperties.getExcludeUrl()) {
if (uri.equals(servletRequest.getRequestURI())) {
return true;
}
}

return false;
}
}

日志bean

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
package com.jafir.logback.aop;

import lombok.Data;

import java.time.Instant;
import java.util.List;
import java.util.Map;

@Data
public class WebLog {
private Instant timestamp;
private Long timeTaken;
private Request request;
private Response response;

@Data
public static class Request {
private String method;
private String uri;
private String requestUri;
private Map<String, List<String>> headers;
private Map<String, List<String>> parameters;
private String body;
private Map<String, String> pathParameters;
}

@Data
public static class Response {
/**
* http 的 status
*/
private Integer status;
/**
* WebResponseBody 的 code
*/
private Integer bodyCode;
private Map<String, List<String>> headers;
private String responseBody;
}
}

request的代理类(主要目的是保留读取到的流数据。流只能读取一次)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
package com.jafir.logback.aop;

import javax.servlet.ServletInputStream;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletRequestWrapper;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

public class DelegateHttpRequest extends HttpServletRequestWrapper {
private byte[] bytes;
private final byte[] buffer = new byte[4096];

public DelegateHttpRequest(HttpServletRequest request) {
super(request);
}

@Override
public ServletInputStream getInputStream() throws IOException {
if (bytes == null) {
ServletInputStream inputStream = super.getInputStream();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
int size = 0;
while ((size = inputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, size);
}
outputStream.close();
bytes = outputStream.toByteArray();
}

return new DelegateServletInputStream(new ByteArrayInputStream(bytes));
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package com.jafir.logback.aop;

import javax.servlet.ReadListener;
import javax.servlet.ServletInputStream;
import java.io.IOException;
import java.io.InputStream;

public class DelegateServletInputStream extends ServletInputStream {
private final InputStream inputStream;

public DelegateServletInputStream(InputStream inputStream){
this.inputStream = inputStream;
}

@Override
public boolean isFinished() {
return false;
}

@Override
public boolean isReady() {
return true;
}

@Override
public void setReadListener(ReadListener listener) { }

@Override
public int read() throws IOException {
return inputStream.read();
}
}

以上核心内容其实就是 一个拦截器 进行拦截接口,然后按照一定的结构打印日志,然后logback再利用appender 写入到fluentd中,完成日志的收集。

fluent的日志收集大致如下:

fluency

收集的日志结构中比较重要的字段有:

timetaken: 接口耗时

request:请求

response:返回结果(错误时包含body信息)

status: 表示http的状态 正常都是200

bodyCode: 表示websponse结构里面的code值(如果你的返回结构是 在http之上 又封装了一层 msg code body的话 这里的bodyCode就是 返回的结果里面的code,一般我们会对其进行业务异常和系统异常的code区分。比如 200是正常,500为系统异常,其他是业务异常等)

目前比较重要的是logback-spring.xml中,有几个点。

env : 用于区分环境,logback-spring.xml是支持 profile 获取的

FLUENCY_SYNC: 普通的日志收集,就是整个应用程序的日志。如果是应用日志 则索引名为 $applicationName-年月日

FLUENCY_SYNC_ACCESS: 访问日志的收集,也就是接口的请求日志 通过webaspect拦截器写的日志。如果是访问日志 则索引名为access-$applicationName–年月日

这样的话可以在es中区分开应用日志和访问日志,FLUENCY_SYNC和FLUENCY_SYNC_ACCESS 也能够分开进行收集。

如: 普通的日志就用 FLUENCY_SYNC ,只有 WebLogAspect 下面的拦截器日志,用FLUENCY_SYNC_ACCESS收集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
 <springProfile name="prod">
<!-- 日志输出级别 -->
<root level="INFO">
<appender-ref ref="STDOUT"/>
<!-- <appender-ref ref="LOCAL_ALL"/>-->
<!-- <appender-ref ref="LOCAL_ERROR"/>-->
<appender-ref ref="FLUENCY"/>
</root>

<logger name="com.jafir.logback.aop.WebLogAspect" level="INFO" additivity="false">
<appender-ref ref="STDOUT"/>
<appender-ref ref="FLUENCY_ACCESS"/>
</logger>
</springProfile>

如上则完成了 efk的日志收集,最终在kibana中可以通过新建pattern来查阅筛选日志信息。

日志监控和告警

对于服务的接口已经按照不同的索引存在于了es中,我们也可以用grafana来进行展示和监控。

grafana添加es datasource

img

img

添加监控表

img

错误统计

错误数query条件: 利用status 或者 bodeCode

img

1
env:"test" AND @log_name:"access-xxxx" AND   !response.status:"200"

意为:测试环境下的xxx服务,返回结果不等于200的数量

接口响应统计

接口响应query条件: 利用timeTaken

img

1
env:"test" AND @log_name:"access-jisu-http-web"
注意:

grafan的监控表 query不能使用变量,只能写死,所以可能会写多个环境 多个服务 多张表

添加告警

img

告警理论上可以使用grafana自身集成的alertmanager 但是尝试之后发现并不好用 所以我们还是使用 前面prometheus监控搭建得 alertmanager 和 prometheus-alert结合使用

img

这里就添加对应地址即可

img

img

其他可以默认 然后就好了

原理介绍

es数据源-》grafana (监控数据表 触发告警条件 发送告警) -》 alertmanager (配置路由到指定webhook) -》 prometheus-alert (根据不同模板组装数据)-》企业微信

alertmanager和prometheus-alert配置调整

prometheus-alert地址

1
http://192.168.20.2:8080/

img

找到grafana-wx 然后设置模板

1
2
3
4
5
6
7
8
9
10
11
12
13
{{range $k, $v := .alerts}}{{if eq $v.status "resolved"}}## [Prometheus恢复]()
###### 告警类型: {{$v.labels.alertname}}
###### 告警状态: {{ $v.status }}
###### 告警详情: {{$v.annotations.__value_string__}}
###### 故障时间:{{GetCSTtime $v.startsAt}}
###### 恢复时间:{{GetCSTtime $v.endsAt}}
{{else}}
## [Prometheus告警]()
###### 告警类型: {{$v.labels.alertname}}
###### 告警状态: {{ $v.status }}
###### 告警详情: {{$v.annotations.__value_string__}}
###### 故障时间:{{GetCSTtime $v.startsAt}}
{{end}}{{end}}

也可以进行测试 (测试内容可以从prometheus-alert日志中寻找)

1
{"receiver":"web\\.hook\\.grafanaalert","status":"resolved","alerts":[{"status":"resolved","labels":{"__alert_rule_namespace_uid__":"IrqNMj34z","__alert_rule_uid__":"lBw5-C3Vz","alertname":"DatasourceNoData","datasource_uid":"bP2dUr3Vz","ref_id":"A","rulename":"api-server错误"},"annotations":{"__dashboardUid__":"CBAou9qVz","__panelId__":"10"},"startsAt":"2023-08-03T00:00:12.197Z","endsAt":"2023-08-03T01:12:05.562Z","generatorURL":"http://localhost:3000/alerting/lBw5-C3Vz/edit","fingerprint":"f265175a34e6cc2e"}],"groupLabels":{"alertname":"DatasourceNoData"},"commonLabels":{"__alert_rule_namespace_uid__":"IrqNMj34z","__alert_rule_uid__":"lBw5-C3Vz","alertname":"DatasourceNoData","datasource_uid":"bP2dUr3Vz","ref_id":"A","rulename":"api-server错误"},"commonAnnotations":{"__dashboardUid__":"CBAou9qVz","__panelId__":"10"},"externalURL":"http://alertmanager:9093","version":"4","groupKey":"{}/{__alert_rule_namespace_uid__=\"IrqNMj34z\"}:{alertname=\"DatasourceNoData\"}","truncatedAlerts":0}

img

prometheus告警模板

img

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{{range $k, $v := .alerts}}{{if eq $v.status "resolved"}}
## [Prometheus恢复]()
###### 告警类型: {{$v.labels.alertname}}
###### 故障主机: {{$v.labels.instance}}
###### 环境类型:{{$v.labels.job}}
###### 告警详情: {{$v.annotations.description}}
###### 故障时间:{{GetCSTtime $v.startsAt}}
###### 恢复时间:{{GetCSTtime $v.endsAt}}{{else}}
## [Prometheus告警]()
###### 告警类型: {{$v.labels.alertname}}
###### 故障主机: {{$v.labels.instance}}
###### 环境类型:{{$v.labels.job}}
###### 告警详情: {{$v.annotations.description}}
###### 故障时间:{{GetCSTtime $v.startsAt}}{{end}}
{{end}}

alertmanager配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
global:
resolve_timeout: 15s
route:
group_by: ['alertname','instance']
group_wait: 10s
group_interval: 10s
repeat_interval: 2m
receiver: 'web.hook.prometheusalert'
routes:
- receiver: 'web.hook.grafanaalert' # 路由到名为 "web.hook.grafanaalert" 的接收器
match:
__alert_rule_namespace_uid__: 'IrqNMj34z' # 匹配 alertname 为 "grafana" 的告警
receivers:
- name: 'web.hook.prometheusalert'
webhook_configs:
- url: 'http://prometheus-alert:8080/prometheusalert?type=wx&tpl=prometheus-wx&wxurl=你的企业微信webhook'
- name: 'web.hook.grafanaalert'
webhook_configs:
- url: 'http://prometheus-alert:8080/prometheusalert?type=wx&tpl=grafana-wx&wxurl=你的企业微信webhook'

配置含义:

10s 检测一下 2m 再重复提示

默认情况下都认为是prometheus的告警,走prometheusalert发送到对应prometheus-wx的模板

如果是数据包含 __alert_rule_namespace_uid__: 'IrqNMj34z' 则认为是grafana的告警 走grafanaalert发送到对应grafana-wx的模板

以上配置可以自适应调整,如果有发短信 或者 打电话告警的,也可以利用prometheusAlert全家桶的方式接入进来。

测验

配置好了之后 就可以在grafana进行告警测试了

img