前言 作者目前打算分享一期关于devOps系列的文章,希望对热爱学习和探索的你有所帮助。
文章主要记录一些简洁、高效的运维部署指令,旨在 记录 和能够快速地构建系统 。就像运维文档或者手册一样,方便进行系统的重建、改造和优化。每篇文章独立出来,可以单独作为其中一项组件的部署和使用。
本章为 devOps系列(八)efk+prometheus+grafana日志监控和告警
大纲 devOps系列介绍
devOps系列(一)docker搭建
devOps系列(二)gitlab搭建
devOps系列(三)nexus-harbor搭建
devOps系列(四)jenkins搭建
devOps系列(五)efk系统搭建
devOps系列(六)grafana+prometheus搭建
devOps系列(七)grafana+prometheus监控告警
devOps系列(八)efk+prometheus+grafana日志监控和告警
正文 日志收集 目前我们已经搭建好了efk日志系统,接下来就是把日志数据采集进来。
目前java程序的采集,可以在框架侧写一个基于logback日志收集starter依赖框架,便于日志收集的安装和管理。
可以自建一个starter依赖工程项目,也可以直接植入项目工程。
注:本文着重介绍核心原理,可能无法直接使用
需要引入的依赖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 <dependency> <groupId>com.sndyuk</groupId> <artifactId>logback-more-appenders</artifactId> </dependency> <dependency> <groupId>ch.qos.logback</groupId> <artifactId>logback-classic</artifactId> </dependency> <dependency> <groupId>org.komamitsu</groupId> <artifactId>fluency-core</artifactId> </dependency> <dependency> <groupId>org.komamitsu</groupId> <artifactId>fluency-fluentd</artifactId> </dependency>
核心logback配置文件 logback-spring.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 <?xml version="1.0" encoding="UTF-8"?> <configuration debug="false"> <!--定义日志文件的存储地址 勿在 LogBack 的配置中使用相对路径--> <springProperty name="profile" source="spring.profiles.active"/> <springProperty name="applicationName" source="spring.application.name"/> <!-- 默认地址 --> <springProperty name="fluentdAddr" source="framework.logback.fluentd-addr" defaultValue="fluentd.jafir.top"/> <property name="LOG_HOME" value="/${applicationName}/logs"/> <!-- 控制台输出 --> <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender"> <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder"> <!--格式化输出:%d表示日期,%thread表示线程名,%-5level:级别从左显示5个字符宽度%msg:日志消息,%n是换行符--> <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n</pattern> </encoder> </appender> <!-- info及其以上日志 --> <appender name="LOCAL_ALL" class="ch.qos.logback.core.rolling.RollingFileAppender"> <filter class="ch.qos.logback.classic.filter.ThresholdFilter"> <level>INFO</level> </filter> <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy"> <!--日志文件输出的文件名--> <FileNamePattern>${LOG_HOME}/info_log.%d{yyyy-MM-dd}.log</FileNamePattern> <!--日志文件保留天数--> <MaxHistory>30</MaxHistory> </rollingPolicy> <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder"> <!--格式化输出:%d表示日期,%thread表示线程名,%-5level:级别从左显示5个字符宽度%msg:日志消息,%n是换行符--> <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n</pattern> <!-- 设置编码格式,以防中文乱码 --> <charset class="java.nio.charset.Charset">UTF-8</charset> </encoder> <!--日志文件最大的大小--> <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy"> <MaxFileSize>10MB</MaxFileSize> </triggeringPolicy> </appender> <!-- 错误日志 --> <appender name="LOCAL_ERROR" class="ch.qos.logback.core.rolling.RollingFileAppender"> <filter class="ch.qos.logback.classic.filter.LevelFilter"> <level>ERROR</level> <onMatch>ACCEPT</onMatch> <onMismatch>DENY</onMismatch> </filter> <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy"> <!--日志文件输出的文件名--> <FileNamePattern>${LOG_HOME}/error_log.%d{yyyy-MM-dd}.log</FileNamePattern> <!--日志文件保留天数--> <MaxHistory>30</MaxHistory> </rollingPolicy> <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder"> <!--格式化输出:%d表示日期,%thread表示线程名,%-5level:级别从左显示5个字符宽度%msg:日志消息,%n是换行符--> <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} - %msg%n</pattern> <!-- 设置编码格式,以防中文乱码 --> <charset class="java.nio.charset.Charset">UTF-8</charset> </encoder> <!--日志文件最大的大小--> <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy"> <MaxFileSize>10MB</MaxFileSize> </triggeringPolicy> </appender> <!-- Fluency --> <appender name="FLUENCY_SYNC" class="ch.qos.logback.more.appenders.FluencyLogbackAppender"> <!-- Tag for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file --> <!-- 微服务名 --> <tag>${applicationName}</tag> <!-- [Optional] Label for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file --> <!-- Host name/address and port number which Fluentd placed --> <remoteHost>${fluentdAddr}</remoteHost> <port>24224</port> <!-- [Optional] Multiple name/addresses and port numbers which Fluentd placed <remoteServers> <remoteServer> <host>primary</host> <port>24224</port> </remoteServer> <remoteServer> <host>secondary</host> <port>24224</port> </remoteServer> </remoteServers> --> <!-- [Optional] Additional fields(Pairs of key: value) --> <!-- 环境 --> <additionalField> <key>env</key> <value>${profile}</value> </additionalField> <!-- [Optional] Configurations to customize Fluency's behavior: https://github.com/komamitsu/fluency#usage --> <ackResponseMode>false</ackResponseMode> <!-- <fileBackupDir>/tmp</fileBackupDir> --> <bufferChunkInitialSize>33554432</bufferChunkInitialSize> <bufferChunkRetentionSize>268435456</bufferChunkRetentionSize> <maxBufferSize>1073741824</maxBufferSize> <bufferChunkRetentionTimeMillis>1000</bufferChunkRetentionTimeMillis> <connectionTimeoutMilli>5000</connectionTimeoutMilli> <readTimeoutMilli>5000</readTimeoutMilli> <waitUntilBufferFlushed>30</waitUntilBufferFlushed> <waitUntilFlusherTerminated>40</waitUntilFlusherTerminated> <flushAttemptIntervalMillis>200</flushAttemptIntervalMillis> <senderMaxRetryCount>12</senderMaxRetryCount> <!-- [Optional] Enable/Disable use of EventTime to get sub second resolution of log event date-time --> <useEventTime>true</useEventTime> <sslEnabled>false</sslEnabled> <!-- [Optional] Enable/Disable use the of JVM Heap for buffering --> <jvmHeapBufferMode>false</jvmHeapBufferMode> <!-- [Optional] If true, Map Marker is expanded instead of nesting in the marker name --> <flattenMapMarker>false</flattenMapMarker> <!-- [Optional] default "marker" --> <markerPrefix></markerPrefix> <!-- [Optional] Message encoder if you want to customize message --> <encoder> <pattern><![CDATA[%-5level %logger{50}#%line %message]]></pattern> </encoder> <!-- [Optional] Message field key name. Default: "message" --> <messageFieldKeyName>msg</messageFieldKeyName> </appender> <!-- Fluency --> <appender name="FLUENCY_SYNC_ACCESS" class="ch.qos.logback.more.appenders.FluencyLogbackAppender"> <!-- Tag for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file --> <!-- 微服务名 --> <tag>access-${applicationName}</tag> <!-- [Optional] Label for Fluentd. Farther information: http://docs.fluentd.org/articles/config-file --> <!-- Host name/address and port number which Fluentd placed --> <remoteHost>${fluentdAddr}</remoteHost> <port>24224</port> <!-- [Optional] Multiple name/addresses and port numbers which Fluentd placed <remoteServers> <remoteServer> <host>primary</host> <port>24224</port> </remoteServer> <remoteServer> <host>secondary</host> <port>24224</port> </remoteServer> </remoteServers> --> <!-- [Optional] Additional fields(Pairs of key: value) --> <!-- 环境 --> <additionalField> <key>env</key> <value>${profile}</value> </additionalField> <!-- [Optional] Configurations to customize Fluency's behavior: https://github.com/komamitsu/fluency#usage --> <ackResponseMode>false</ackResponseMode> <!-- <fileBackupDir>/tmp</fileBackupDir> --> <bufferChunkInitialSize>33554432</bufferChunkInitialSize> <bufferChunkRetentionSize>268435456</bufferChunkRetentionSize> <maxBufferSize>1073741824</maxBufferSize> <bufferChunkRetentionTimeMillis>1000</bufferChunkRetentionTimeMillis> <connectionTimeoutMilli>5000</connectionTimeoutMilli> <readTimeoutMilli>5000</readTimeoutMilli> <waitUntilBufferFlushed>30</waitUntilBufferFlushed> <waitUntilFlusherTerminated>40</waitUntilFlusherTerminated> <flushAttemptIntervalMillis>200</flushAttemptIntervalMillis> <senderMaxRetryCount>12</senderMaxRetryCount> <!-- [Optional] Enable/Disable use of EventTime to get sub second resolution of log event date-time --> <useEventTime>true</useEventTime> <sslEnabled>false</sslEnabled> <!-- [Optional] Enable/Disable use the of JVM Heap for buffering --> <jvmHeapBufferMode>false</jvmHeapBufferMode> <!-- [Optional] If true, Map Marker is expanded instead of nesting in the marker name --> <flattenMapMarker>false</flattenMapMarker> <!-- [Optional] default "marker" --> <markerPrefix></markerPrefix> <!-- [Optional] Message encoder if you want to customize message --> <encoder> <pattern>%message%n</pattern> </encoder> <!-- [Optional] Message field key name. Default: "message" --> <messageFieldKeyName>msg</messageFieldKeyName> </appender> <appender name="FLUENCY" class="ch.qos.logback.classic.AsyncAppender"> <!-- Max queue size of logs which is waiting to be sent (When it reach to the max size, the log will be disappeared). --> <queueSize>999</queueSize> <!-- Never block when the queue becomes full. --> <neverBlock>true</neverBlock> <!-- The default maximum queue flush time allowed during appender stop. If the worker takes longer than this time it will exit, discarding any remaining items in the queue. 10000 millis --> <maxFlushTime>1000</maxFlushTime> <appender-ref ref="FLUENCY_SYNC"/> </appender> <appender name="FLUENCY_ACCESS" class="ch.qos.logback.classic.AsyncAppender"> <!-- Max queue size of logs which is waiting to be sent (When it reach to the max size, the log will be disappeared). --> <queueSize>999</queueSize> <!-- Never block when the queue becomes full. --> <neverBlock>true</neverBlock> <!-- The default maximum queue flush time allowed during appender stop. If the worker takes longer than this time it will exit, discarding any remaining items in the queue. 10000 millis --> <maxFlushTime>1000</maxFlushTime> <appender-ref ref="FLUENCY_SYNC_ACCESS"/> </appender> <springProfile name="local"> <!-- 日志输出级别 --> <root level="INFO"> <appender-ref ref="STDOUT"/> <appender-ref ref="FLUENCY"/> <!-- <appender-ref ref="LOCAL_ALL"/>--> <!-- <appender-ref ref="LOCAL_ERROR"/>--> <!-- <appender-ref ref="FLUENCY"/>--> </root> <logger name="com.jafir.logback.aop.WebLogAspect" level="INFO" additivity="false"> <appender-ref ref="STDOUT"/> <!-- <appender-ref ref="FLUENCY_ACCESS"/>--> </logger> </springProfile> <springProfile name="dev,test,preprod"> <!-- 日志输出级别 --> <root level="INFO"> <appender-ref ref="STDOUT"/> <!-- <appender-ref ref="LOCAL_ALL"/>--> <!-- <appender-ref ref="LOCAL_ERROR"/>--> <appender-ref ref="FLUENCY"/> </root> <logger name="com.jafir.logback.aop.WebLogAspect" level="INFO" additivity="false"> <appender-ref ref="STDOUT"/> <appender-ref ref="FLUENCY_ACCESS"/> </logger> </springProfile> <springProfile name="prod"> <!-- 日志输出级别 --> <root level="INFO"> <appender-ref ref="STDOUT"/> <!-- <appender-ref ref="LOCAL_ALL"/>--> <!-- <appender-ref ref="LOCAL_ERROR"/>--> <appender-ref ref="FLUENCY"/> </root> <logger name="com.jafir.logback.aop.WebLogAspect" level="INFO" additivity="false"> <appender-ref ref="STDOUT"/> <appender-ref ref="FLUENCY_ACCESS"/> </logger> </springProfile> <!-- 关闭某个日志打印 --> <logger name="org.komamitsu.fluency.Fluency" level="OFF" /> <logger name="org.komamitsu.fluency.fluentd.ingester.sender.RetryableSender" level="OFF" /> <logger name="org.komamitsu.fluency.fluentd.ingester.sender.NetworkSender" level="OFF" /> </configuration>
springBoot的AutoConfiguration类
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 package com.jafir.logback; import com.jafir.logback.aop.WebLogAspect; import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty; import org.springframework.boot.context.properties.EnableConfigurationProperties; import org.springframework.context.annotation.ComponentScan; import org.springframework.context.annotation.Configuration; import org.springframework.context.annotation.Import; @Configuration @Import(WebLogAspect.class) @EnableConfigurationProperties(LogbackProperties.class) @ConditionalOnProperty(prefix = "framework.logback", value = "enabled", havingValue = "true", matchIfMissing = true) @ComponentScan(value = "com.jafir.logback") public class LogbackAutoConfiguration { }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 package com.jafir.logback; import org.springframework.boot.context.properties.ConfigurationProperties; import java.util.List; @ConfigurationProperties("framework.logback") public class LogbackProperties { private Boolean enabled = false; private String fluentdAddr = "fluentd.jaifr.top"; private List<String> excludeUrl; public List<String> getExcludeUrl() { return excludeUrl; } public void setExcludeUrl(List<String> excludeUrl) { this.excludeUrl = excludeUrl; } public Boolean getEnabled() { return enabled; } public void setEnabled(Boolean enabled) { this.enabled = enabled; } public String getFluentdAddr() { return fluentdAddr; } public void setFluentdAddr(String fluentdAddr) { this.fluentdAddr = fluentdAddr; } }
可以通过yml配置文件来进行装配控制
framework.logback.enabled 控制是否开启日志收集
framework.logback.fluentdAddr 设置fluentd的地址
framework.logback.excludeUrl 设置过滤不进行收集的地址
核心servlet拦截器类
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 package com.jafir.logback.aop; import cn.hutool.core.collection.CollUtil; import cn.hutool.http.HttpStatus; import com.fasterxml.jackson.databind.ObjectMapper; import com.google.gson.Gson; import com.jafir.logback.LogResponseBody; import com.jafir.logback.LogbackProperties; import lombok.extern.slf4j.Slf4j; import org.apache.commons.io.IOUtils; import org.aspectj.lang.ProceedingJoinPoint; import org.aspectj.lang.annotation.Around; import org.aspectj.lang.annotation.Aspect; import org.springframework.web.servlet.HandlerMapping; import org.springframework.web.util.ContentCachingResponseWrapper; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import java.io.ByteArrayOutputStream; import java.io.InputStream; import java.nio.charset.StandardCharsets; import java.time.Instant; import java.util.*; @Aspect @Slf4j public class WebLogAspect { private final boolean NEED_RESPONSE_BODY = true; private final LogbackProperties logbackProperties; private final ObjectMapper objectMapper; private static final List<String> DEFAULT_EXCLUDE_URL = new ArrayList<>(); static { DEFAULT_EXCLUDE_URL.add("/actuator/prometheus"); DEFAULT_EXCLUDE_URL.add("/health/detect"); } public WebLogAspect(ObjectMapper objectMapper, LogbackProperties logbackProperties) { this.objectMapper = objectMapper; this.logbackProperties = logbackProperties; if (CollUtil.isNotEmpty(logbackProperties.getExcludeUrl())) { logbackProperties.getExcludeUrl().addAll(DEFAULT_EXCLUDE_URL); } else { logbackProperties.setExcludeUrl(DEFAULT_EXCLUDE_URL); } } @Around("execution(public void javax.servlet.http.HttpServlet.service(..)))") public Object webLog(ProceedingJoinPoint joinPoint) throws Throwable { Object[] args = joinPoint.getArgs(); DelegateHttpRequest servletRequest = new DelegateHttpRequest((HttpServletRequest) args[0]); HttpServletResponse servletResponse = (HttpServletResponse) args[1]; ContentCachingResponseWrapper responseWrapper = null; if (NEED_RESPONSE_BODY) { responseWrapper = new ContentCachingResponseWrapper(servletResponse); } // 过滤不需要拦截的请求 if (doNotIntercept(servletRequest)) { return joinPoint.proceed(); } WebLog webLog = new WebLog(); webLog.setTimestamp(Instant.now()); WebLog.Request request = new WebLog.Request(); InputStream servletRequestStream = servletRequest.getInputStream(); int size; byte[] buffer = new byte[1024]; ByteArrayOutputStream tmpRequestStream = new ByteArrayOutputStream(); while ((size = servletRequestStream.read(buffer)) != -1) { tmpRequestStream.write(buffer, 0, size); } request.setBody(tmpRequestStream.toString()); Map<String, List<String>> requestHeaders = new HashMap<>(); Enumeration<String> servletRequestHeaders = servletRequest.getHeaderNames(); while (servletRequestHeaders.hasMoreElements()) { String header = servletRequestHeaders.nextElement(); Enumeration<String> values = servletRequest.getHeaders(header); List<String> list = new ArrayList<>(); while (values.hasMoreElements()) { String value = values.nextElement(); list.add(value); } requestHeaders.put(header, list); } request.setHeaders(requestHeaders); request.setMethod(servletRequest.getMethod()); Object rawUrl = servletRequest.getAttribute("raw-api-uri"); if (rawUrl instanceof String) { request.setRequestUri((String) rawUrl); } else { request.setRequestUri(servletRequest.getRequestURI()); } Map<String, List<String>> parameters = new HashMap<>(); for (Map.Entry<String, String[]> entry : servletRequest.getParameterMap().entrySet()) { List<String> list = new ArrayList<>(Arrays.asList(entry.getValue())); parameters.put(entry.getKey(), list); } request.setParameters(parameters); Object attributeStart = servletRequest.getAttribute("raw-api-start"); long start; if (attributeStart instanceof Long) { start = (long) attributeStart; } else { start = System.nanoTime(); } Object value; try { if (NEED_RESPONSE_BODY) { value = joinPoint.proceed(new Object[]{servletRequest, responseWrapper}); } else { value = joinPoint.proceed(new Object[]{servletRequest, servletResponse}); } } catch (Throwable e) { ((HttpServletRequest) args[0]).setAttribute("raw-api-uri", servletRequest.getRequestURI()); ((HttpServletRequest) args[0]).setAttribute("raw-api-start", start); throw e; } @SuppressWarnings("unchecked") Map<String, String> pathMap = (Map<String, String>) servletRequest.getAttribute(HandlerMapping.URI_TEMPLATE_VARIABLES_ATTRIBUTE); if (pathMap != null && !pathMap.isEmpty()) { //写入path数据 request.setPathParameters(pathMap); } long timeTaken = (System.nanoTime() - start) / 1_000_000; WebLog.Response response = new WebLog.Response(); int status = servletResponse.getStatus(); if (NEED_RESPONSE_BODY) { boolean isSuccess = true; // 成功的接口不用记录 responseBody if (HttpStatus.HTTP_OK != status) { isSuccess = false; } String responseBodyStr; try { responseBodyStr = IOUtils.toString(responseWrapper.getContentInputStream(), StandardCharsets.UTF_8.displayName()); } catch (Exception e) { responseBodyStr = ""; isSuccess = false; log.error("接口: {} ,IOUtils.toString 出现异常", servletRequest.getRequestURI()); } // 失败的记录一下body if (!isSuccess) { response.setResponseBody(responseBodyStr); } try { responseWrapper.copyBodyToResponse(); } catch (Exception e) { log.error("接口: {} ,copyBodyToResponse 出现异常", servletRequest.getRequestURI()); } } response.setStatus(status); Map<String, List<String>> responseHeaders = new HashMap<>(); Collection<String> servletResponseHeaders = servletResponse.getHeaderNames(); for (String headerName : servletResponseHeaders) { Collection<String> values = servletResponse.getHeaders(headerName); List<String> list = new ArrayList<>(values); responseHeaders.put(headerName, list); } response.setHeaders(responseHeaders); String bestUri = String.valueOf(servletRequest.getRequest().getAttribute(HandlerMapping.BEST_MATCHING_PATTERN_ATTRIBUTE)); //兼容处理 抛异常情况下该值null 用requestUri兼容 if(bestUri!=null && !bestUri.isEmpty() && !"null".equals(bestUri)) { request.setUri(bestUri); }else { request.setUri(request.getRequestUri()); } webLog.setTimeTaken(timeTaken); webLog.setRequest(request); webLog.setResponse(response); log.info(objectMapper.writeValueAsString(webLog)); return value; } private boolean doNotIntercept(DelegateHttpRequest servletRequest) { // 放行文件类型 if (servletRequest.getContentType() != null && servletRequest.getContentType().contains("multipart")) { return true; } // 如果是 post的 x-www-form 类型 也就是 xx=xx&xx=xx&xx 这种格式的 (一般很少有这样使用的) if ((servletRequest.getContentType() != null && servletRequest.getContentType().contains("application/x-www-form-urlencoded") && "post".equalsIgnoreCase(servletRequest.getMethod()))) { return true; } // 放行不做拦截的uri for (String uri : logbackProperties.getExcludeUrl()) { if (uri.equals(servletRequest.getRequestURI())) { return true; } } return false; } }
日志bean
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 package com.jafir.logback.aop; import lombok.Data; import java.time.Instant; import java.util.List; import java.util.Map; @Data public class WebLog { private Instant timestamp; private Long timeTaken; private Request request; private Response response; @Data public static class Request { private String method; private String uri; private String requestUri; private Map<String, List<String>> headers; private Map<String, List<String>> parameters; private String body; private Map<String, String> pathParameters; } @Data public static class Response { /** * http 的 status */ private Integer status; /** * WebResponseBody 的 code */ private Integer bodyCode; private Map<String, List<String>> headers; private String responseBody; } }
request的代理类(主要目的是保留读取到的流数据。流只能读取一次)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 package com.jafir.logback.aop; import javax.servlet.ServletInputStream; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletRequestWrapper; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; public class DelegateHttpRequest extends HttpServletRequestWrapper { private byte[] bytes; private final byte[] buffer = new byte[4096]; public DelegateHttpRequest(HttpServletRequest request) { super(request); } @Override public ServletInputStream getInputStream() throws IOException { if (bytes == null) { ServletInputStream inputStream = super.getInputStream(); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); int size = 0; while ((size = inputStream.read(buffer)) != -1) { outputStream.write(buffer, 0, size); } outputStream.close(); bytes = outputStream.toByteArray(); } return new DelegateServletInputStream(new ByteArrayInputStream(bytes)); } }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 package com.jafir.logback.aop; import javax.servlet.ReadListener; import javax.servlet.ServletInputStream; import java.io.IOException; import java.io.InputStream; public class DelegateServletInputStream extends ServletInputStream { private final InputStream inputStream; public DelegateServletInputStream(InputStream inputStream){ this.inputStream = inputStream; } @Override public boolean isFinished() { return false; } @Override public boolean isReady() { return true; } @Override public void setReadListener(ReadListener listener) { } @Override public int read() throws IOException { return inputStream.read(); } }
以上核心内容其实就是 一个拦截器 进行拦截接口,然后按照一定的结构打印日志,然后logback再利用appender 写入到fluentd中,完成日志的收集。
fluent的日志收集大致如下:
收集的日志结构中比较重要的字段有:
timetaken: 接口耗时
request:请求
response:返回结果(错误时包含body信息)
status: 表示http的状态 正常都是200
bodyCode: 表示websponse结构里面的code值(如果你的返回结构是 在http之上 又封装了一层 msg code body的话 这里的bodyCode就是 返回的结果里面的code,一般我们会对其进行业务异常和系统异常的code区分。比如 200是正常,500为系统异常,其他是业务异常等)
目前比较重要的是logback-spring.xml中,有几个点。
env : 用于区分环境,logback-spring.xml是支持 profile 获取的
FLUENCY_SYNC: 普通的日志收集,就是整个应用程序的日志。如果是应用日志 则索引名为 $applicationName-年月日
FLUENCY_SYNC_ACCESS: 访问日志的收集,也就是接口的请求日志 通过webaspect拦截器写的日志。如果是访问日志 则索引名为access-$applicationName–年月日
这样的话可以在es中区分开应用日志和访问日志,FLUENCY_SYNC和FLUENCY_SYNC_ACCESS 也能够分开进行收集。
如: 普通的日志就用 FLUENCY_SYNC ,只有 WebLogAspect 下面的拦截器日志,用FLUENCY_SYNC_ACCESS收集
1 2 3 4 5 6 7 8 9 10 11 12 13 14 <springProfile name="prod"> <!-- 日志输出级别 --> <root level="INFO"> <appender-ref ref="STDOUT"/> <!-- <appender-ref ref="LOCAL_ALL"/>--> <!-- <appender-ref ref="LOCAL_ERROR"/>--> <appender-ref ref="FLUENCY"/> </root> <logger name="com.jafir.logback.aop.WebLogAspect" level="INFO" additivity="false"> <appender-ref ref="STDOUT"/> <appender-ref ref="FLUENCY_ACCESS"/> </logger> </springProfile>
如上则完成了 efk的日志收集,最终在kibana中可以通过新建pattern来查阅筛选日志信息。
日志监控和告警 对于服务的接口已经按照不同的索引存在于了es中,我们也可以用grafana来进行展示和监控。
grafana添加es datasource
添加监控表
错误统计 错误数query条件: 利用status 或者 bodeCode
1 env:"test" AND @log_name:"access-xxxx" AND !response.status:"200"
意为:测试环境下的xxx服务,返回结果不等于200的数量
接口响应统计 接口响应query条件: 利用timeTaken
1 env:"test" AND @log_name:"access-jisu-http-web"
注意: grafan的监控表 query不能使用变量,只能写死,所以可能会写多个环境 多个服务 多张表
添加告警
告警理论上可以使用grafana自身集成的alertmanager 但是尝试之后发现并不好用 所以我们还是使用 前面prometheus监控搭建得 alertmanager 和 prometheus-alert结合使用
这里就添加对应地址即可
其他可以默认 然后就好了
原理介绍
es数据源-》grafana (监控数据表 触发告警条件 发送告警) -》 alertmanager (配置路由到指定webhook) -》 prometheus-alert (根据不同模板组装数据)-》企业微信
alertmanager和prometheus-alert配置调整 prometheus-alert地址
1 http://192.168.20.2:8080/
找到grafana-wx 然后设置模板
1 2 3 4 5 6 7 8 9 10 11 12 13 {{range $k, $v := .alerts}}{{if eq $v.status "resolved"}}## [Prometheus恢复]() ###### 告警类型: {{$v.labels.alertname}} ###### 告警状态: {{ $v.status }} ###### 告警详情: {{$v.annotations.__value_string__}} ###### 故障时间:{{GetCSTtime $v.startsAt}} ###### 恢复时间:{{GetCSTtime $v.endsAt}} {{else}} ## [Prometheus告警]() ###### 告警类型: {{$v.labels.alertname}} ###### 告警状态: {{ $v.status }} ###### 告警详情: {{$v.annotations.__value_string__}} ###### 故障时间:{{GetCSTtime $v.startsAt}} {{end}}{{end}}
也可以进行测试 (测试内容可以从prometheus-alert日志中寻找)
1 {"receiver":"web\\.hook\\.grafanaalert","status":"resolved","alerts":[{"status":"resolved","labels":{"__alert_rule_namespace_uid__":"IrqNMj34z","__alert_rule_uid__":"lBw5-C3Vz","alertname":"DatasourceNoData","datasource_uid":"bP2dUr3Vz","ref_id":"A","rulename":"api-server错误"},"annotations":{"__dashboardUid__":"CBAou9qVz","__panelId__":"10"},"startsAt":"2023-08-03T00:00:12.197Z","endsAt":"2023-08-03T01:12:05.562Z","generatorURL":"http://localhost:3000/alerting/lBw5-C3Vz/edit","fingerprint":"f265175a34e6cc2e"}],"groupLabels":{"alertname":"DatasourceNoData"},"commonLabels":{"__alert_rule_namespace_uid__":"IrqNMj34z","__alert_rule_uid__":"lBw5-C3Vz","alertname":"DatasourceNoData","datasource_uid":"bP2dUr3Vz","ref_id":"A","rulename":"api-server错误"},"commonAnnotations":{"__dashboardUid__":"CBAou9qVz","__panelId__":"10"},"externalURL":"http://alertmanager:9093","version":"4","groupKey":"{}/{__alert_rule_namespace_uid__=\"IrqNMj34z\"}:{alertname=\"DatasourceNoData\"}","truncatedAlerts":0}
prometheus告警模板
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {{range $k, $v := .alerts}}{{if eq $v.status "resolved"}} ## [Prometheus恢复]() ###### 告警类型: {{$v.labels.alertname}} ###### 故障主机: {{$v.labels.instance}} ###### 环境类型:{{$v.labels.job}} ###### 告警详情: {{$v.annotations.description}} ###### 故障时间:{{GetCSTtime $v.startsAt}} ###### 恢复时间:{{GetCSTtime $v.endsAt}}{{else}} ## [Prometheus告警]() ###### 告警类型: {{$v.labels.alertname}} ###### 故障主机: {{$v.labels.instance}} ###### 环境类型:{{$v.labels.job}} ###### 告警详情: {{$v.annotations.description}} ###### 故障时间:{{GetCSTtime $v.startsAt}}{{end}} {{end}}
alertmanager配置 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 global: resolve_timeout: 15s route: group_by: ['alertname','instance'] group_wait: 10s group_interval: 10s repeat_interval: 2m receiver: 'web.hook.prometheusalert' routes: - receiver: 'web.hook.grafanaalert' # 路由到名为 "web.hook.grafanaalert" 的接收器 match: __alert_rule_namespace_uid__: 'IrqNMj34z' # 匹配 alertname 为 "grafana" 的告警 receivers: - name: 'web.hook.prometheusalert' webhook_configs: - url: 'http://prometheus-alert:8080/prometheusalert?type=wx&tpl=prometheus-wx&wxurl=你的企业微信webhook' - name: 'web.hook.grafanaalert' webhook_configs: - url: 'http://prometheus-alert:8080/prometheusalert?type=wx&tpl=grafana-wx&wxurl=你的企业微信webhook'
配置含义:
10s 检测一下 2m 再重复提示
默认情况下都认为是prometheus的告警,走prometheusalert发送到对应prometheus-wx的模板
如果是数据包含 __alert_rule_namespace_uid__: 'IrqNMj34z'
则认为是grafana的告警 走grafanaalert发送到对应grafana-wx的模板
以上配置可以自适应调整,如果有发短信 或者 打电话告警的,也可以利用prometheusAlert全家桶的方式接入进来。
测验 配置好了之后 就可以在grafana进行告警测试了