本文是一個不完善的例子,是在請求傳回時增加一個敏感詞過濾器,
之是以說不完善是因為在測試時發現正常的,結構性的部分被過濾掉了,
請将下面的文字放在UltraEdit中比較,會發現,
<img src="/media/resources/images/banner-graphic.png"/>
被替換成了下面的。
<img src="/media/resources/imageX/Xanner-graphic.png"/>
11111111111111111111
4
原始字元串:<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE HTML SYSTEM "about:legacy-compat">
<html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><meta content="IE=8" http-equiv="X-UA-Compatible"/><link href="/media/resources/dijit/themes/tundra/tundra.css" type="text/css" rel="stylesheet"/><link href="/media/resources/styles/standard.css" media="screen" type="text/css" rel="stylesheet"/><link href="/media/resources/images/favicon.ico" rel="SHORTCUT ICON"/><script type="text/javascript">var djConfig = {parseOnLoad: false, isDebug: false, locale: 'zh-cn'};</script><script type="text/javascript" src="/media/resources/dojo/dojo.js"></script><script type="text/javascript" src="/media/resources/spring/Spring.js"></script><script type="text/javascript" src="/media/resources/spring/Spring-Dojo.js"></script><script type="text/javascript" language="JavaScript">dojo.require("dojo.parser");</script><title>Welcome to media</title></head><body class="tundra spring"><div id="wrapper"><div version="2.0" id="header"><a title="Home" name="Home" href="/media/"><img src="/media/resources/images/banner-graphic.png"/></a></div><div version="2.0" id="menu"></div><div id="main"><div version="2.0"><script type="text/javascript">dojo.require('dijit.TitlePane');</script><div id="_title_title_id"><script type="text/javascript">Spring.addDecoration(new Spring.ElementDecoration({elementId : '_title_title_id', widgetType : 'dijit.TitlePane', widgetAttrs : {title: 'Welcome to media', open: true}})); </script><h3>Welcome to media</h3><p>Spring Roo provides interactive, lightweight and user customizable tooling that enables rapid delivery of high performance enterprise Java applications.</p></div></div><div version="2.0" id="footer"><span><a href="/media/">Home</a></span><span id="language"> | Language: <a title="Switch language to English" href="?><img alt="Switch language to English" src="/media/resources/images/en.png" class="flag"/></a> </span><span> | Theme: <a title="standard" href="?theme=standard">standard</a> | <a title="alt" href="?theme=alt">alt</a></span><span><a title="Sponsored by SpringSource" href="http://springsource.com"><img src="/media/resources/images/springsource-logo.png" alt="Sponsored by SpringSource" align="right"/></a></span></div></div></div></body></html>
過濾後的字元串:<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE HTML SYSTEM "about:legacy-compat">
<html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><meta content="IE=8" http-equiv="X-UA-Compatible"/><link href="/media/resources/dijit/themes/tundra/tundra.css" type="text/css" rel="stylesheet"/><link href="/media/resources/styles/standard.css" media="screen" type="text/css" rel="stylesheet"/><link href="/media/resources/images/favicon.ico" rel="SHORTCUT ICON"/><script type="text/javascript">var djConfig = {parseOnLoad: false, isDebug: false, locale: 'zh-cn'};</script><script type="text/javascript" src="/media/resources/dojo/dojo.js"></script><script type="text/javascript" src="/media/resources/spring/Spring.js"></script><script type="text/javascript" src="/media/resources/spring/Spring-Dojo.js"></script><script type="text/javascript" language="JavaScript">dojo.require("dojo.parser");</script><title>Welcome to media</title></head><body class="tundra spring"><div id="wrapper"><div version="2.0" id="header"><a title="Home" name="Home" href="/media/"><img src="/media/resources/imageX/Xanner-graphic.png"/></a></div><div version="2.0" id="menu"></div><div id="main"><div version="2.0"><script type="text/javascript">dojo.require('dijit.TitlePane');</script><div id="_title_title_id"><script type="text/javascript">Spring.addDecoration(new Spring.ElementDecoration({elementId : '_title_title_id', widgetType : 'dijit.TitlePane', widgetAttrs : {title: 'Welcome to media', open: true}})); </script><h3>Welcome to media</h3><p>Spring Roo provides interactive, lightweight and user customizable tooling that enables rapid delivery of high performance enterprise Java applicatiXXX.</p></div></div><div version="2.0" id="footer"><span><a href="/media/">Home</a></span><span id="language"> | Language: <a title="Switch language to English" href="?><img alt="Switch language to English" src="/media/resources/images/en.png" class="flag"/></a> </span><span> | Theme: <a title="standard" href="?theme=standard">standard</a> | <a title="alt" href="?theme=alt">alt</a></span><span><a title="SpXXXored by SpringSource" href="http://springsource.com"><img src="/media/resources/images/springsource-logo.png" alt="SpXXXored by SpringSource" align="right"/></a></span></div></div></div></body></html>
敏感詞清單:sb,ons,ons,ons
敏感詞清單長度:14
好,下面來說下解決方法:
裡面有四部分需要配置,
1,增加一個MySensitiveWordFilter.java的過濾器,
2,增加個敏感詞詞庫sensitive.txt,(與MySensitiveWordFilter.java同目錄)
3,增加FilteredResult.java儲存過濾情況
4,在web.xml中增加過濾器配置。
5,修改pom設定,防止非java的檔案會在打包時被抛棄。
MySensitiveWordFilter.java和sensitive.txt放置在同一個檔案夾下
【1 】
這裡是定義敏感詞的過濾器
MySensitiveWordFilter.java
package com.hcyg.media.core.util;
import java.io.CharArrayWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;
import com.hcyg.media.core.util.SensitiveWord.FilteredResult;
public class MySensitiveWordFilter implements Filter {
// private WordFilterUtil wordFilterUtil ;
private final String ENCODING = null;
private Node tree = new Node();
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException,
ServletException {
PrintWriter out = response.getWriter();
CharResponseWrapper wrapper = new CharResponseWrapper((HttpServletResponse) response);
chain.doFilter(request, wrapper);
String resStr = wrapper.toString();
FilteredResult res = filterText(resStr, 'X');
System.out.println("11111111111111111111");
System.out.println(res.getLevel());// 檢測到的敏感詞中最高優先級的值 0為最小
System.out.println("原始字元串:"+res.getOriginalContent());// 原始字元串
System.out.println("過濾後的字元串:"+res.getFilteredContent().toString());// 過濾後的字元串
System.out.println("敏感詞清單:"+res.getBadWords());// 敏感詞清單
System.out.println("敏感詞清單長度:"+res.getBadWords().length());// 敏感詞清單長度
String newStr = res.getFilteredContent();
out.println(newStr);
}
class CharResponseWrapper extends HttpServletResponseWrapper {
private CharArrayWriter output;
public String toString() {
return output.toString();
}
public CharResponseWrapper(HttpServletResponse response) {
super(response);
output = new CharArrayWriter();
}
public PrintWriter getWriter() {
return new PrintWriter(output);
}
}
public void destroy() {
}
/**
* 初始化時加載配置檔案
*/
@Override
public void init(FilterConfig filterConfig) throws ServletException {
// 讀取檔案
String app = System.getProperty("user.dir");
InputStream is = null;
try {
// WordFilterUtil.class.getResourceAsStream("/SensitiveWord.txt");
// InputStreamReader reader = new InputStreamReader(new
// FileInputStream(file), ENCODING);
String s_xmlpath = "./sensitive.txt";
is = MySensitiveWordFilter.class.getResourceAsStream(s_xmlpath);
InputStreamReader reader = new InputStreamReader(is, "UTF-8");
Properties prop = new Properties();
prop.load(reader);
Enumeration en = prop.propertyNames();
while (en.hasMoreElements()) {
String word = (String) en.nextElement();
insertWord(word, Integer.valueOf(prop.getProperty(word)).intValue());
}
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
if (is != null) {
try {
is.close();
} catch (IOException e1) {
e.printStackTrace();
}
}
} catch (IOException e) {
e.printStackTrace();
if (is != null) {
try {
is.close();
} catch (IOException e2) {
e.printStackTrace();
}
}
} finally {
if (is != null) {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
private void insertWord(String word, int level) {
Node node = tree;
for (int i = ; i < word.length(); i++) {
node = node.addChar(word.charAt(i));
}
node.setEnd(true);
node.setLevel(level);
}
private boolean isPunctuationChar(String c) {
String regex = "[\\pP\\pZ\\pS\\pM\\pC]";
Pattern p = Pattern.compile(regex, );
Matcher m = p.matcher(c);
return m.find();
}
private PunctuationOrHtmlFilteredResult filterPunctation(String originalString) {
StringBuffer filteredString = new StringBuffer();
ArrayList<Integer> charOffsets = new ArrayList();
for (int i = ; i < originalString.length(); i++) {
String c = String.valueOf(originalString.charAt(i));
if (!isPunctuationChar(c)) {
filteredString.append(c);
charOffsets.add(Integer.valueOf(i));
}
}
PunctuationOrHtmlFilteredResult result = new PunctuationOrHtmlFilteredResult();
result.setOriginalString(originalString);
result.setFilteredString(filteredString);
result.setCharOffsets(charOffsets);
return result;
}
private PunctuationOrHtmlFilteredResult filterPunctationAndHtml(String originalString) {
StringBuffer filteredString = new StringBuffer();
ArrayList<Integer> charOffsets = new ArrayList();
int i = ;
for (int k = ; i < originalString.length(); i++) {
String c = String.valueOf(originalString.charAt(i));
if (originalString.charAt(i) == '<') {
for (k = i + ; k < originalString.length(); k++) {
if (originalString.charAt(k) == '<') {
k = i;
} else {
if (originalString.charAt(k) == '>') {
break;
}
}
}
i = k;
} else if (!isPunctuationChar(c)) {
filteredString.append(c);
charOffsets.add(Integer.valueOf(i));
}
}
PunctuationOrHtmlFilteredResult result = new PunctuationOrHtmlFilteredResult();
result.setOriginalString(originalString);
result.setFilteredString(filteredString);
result.setCharOffsets(charOffsets);
return result;
}
private FilteredResult filter(PunctuationOrHtmlFilteredResult pohResult, char replacement) {
StringBuffer sentence = pohResult.getFilteredString();
ArrayList<Integer> charOffsets = pohResult.getCharOffsets();
StringBuffer resultString = new StringBuffer(pohResult.getOriginalString());
StringBuffer badWords = new StringBuffer();
int level = ;
Node node = tree;
int start = ;
int end = ;
for (int i = ; i < sentence.length(); i++) {
start = i;
end = i;
node = tree;
for (int j = i; j < sentence.length(); j++) {
node = node.findChar(sentence.charAt(j));
if (node == null) {
break;
}
if (node.isEnd()) {
end = j;
level = node.getLevel();
}
}
if (end > start) {
for (int k = start; k <= end; k++) {
resultString.setCharAt(((Integer) charOffsets.get(k)).intValue(), replacement);
}
if (badWords.length() > ) {
badWords.append(",");
}
badWords.append(sentence.substring(start, end + ));
i = end;
}
}
FilteredResult result = new FilteredResult();
result.setOriginalContent(pohResult.getOriginalString());
result.setFilteredContent(resultString.toString());
result.setBadWords(badWords.toString());
result.setLevel(Integer.valueOf(level));
return result;
}
public String simpleFilter(String sentence, char replacement) {
StringBuffer sb = new StringBuffer();
Node node = tree;
int start = ;
int end = ;
for (int i = ; i < sentence.length(); i++) {
start = i;
end = i;
node = tree;
for (int j = i; j < sentence.length(); j++) {
node = node.findChar(sentence.charAt(j));
if (node == null) {
break;
}
if (node.isEnd()) {
end = j;
}
}
if (end > start) {
for (int k = start; k <= end; k++) {
sb.append(replacement);
}
i = end;
} else {
sb.append(sentence.charAt(i));
}
}
return sb.toString();
}
public FilteredResult filterText(String originalString, char replacement) {
return filter(filterPunctation(originalString), replacement);
}
public FilteredResult filterHtml(String originalString, char replacement) {
return filter(filterPunctationAndHtml(originalString), replacement);
}
private class PunctuationOrHtmlFilteredResult {
private String originalString;
private StringBuffer filteredString;
private ArrayList<Integer> charOffsets;
public String getOriginalString() {
return this.originalString;
}
public void setOriginalString(String originalString) {
this.originalString = originalString;
}
public StringBuffer getFilteredString() {
return this.filteredString;
}
public void setFilteredString(StringBuffer filteredString) {
this.filteredString = filteredString;
}
public ArrayList<Integer> getCharOffsets() {
return this.charOffsets;
}
public void setCharOffsets(ArrayList<Integer> charOffsets) {
this.charOffsets = charOffsets;
}
}
class Node {
private Map<String, Node> children = new HashMap();
private boolean isEnd = false;
private int level = ;
public Node addChar(char c) {
String cStr = String.valueOf(c);
Node node = (Node) this.children.get(cStr);
if (node == null) {
node = new Node();
this.children.put(cStr, node);
}
return node;
}
public Node findChar(char c) {
String cStr = String.valueOf(c);
return (Node) this.children.get(cStr);
}
public boolean isEnd() {
return this.isEnd;
}
public void setEnd(boolean isEnd) {
this.isEnd = isEnd;
}
public int getLevel() {
return this.level;
}
public void setLevel(int level) {
this.level = level;
}
}
}
【2 】
這裡是敏感詞庫,大家去搜尋下吧,找個相同結構的就行
sensitive.txt
加qq=4
敏感詞=4
【3】FilteredResult過濾結果
/**
*@Copyright:Copyright (c) 2008 - 2100
*@Company:hcyg
*/
package com.hcyg.media.core.util.SensitiveWord;
/**
*@Title:FilteredResult
*@Description:
*@Author:zp
*@Since:2015-8-1
*@Version:1.0.0
*/
public class FilteredResult
{
private Integer level;
private String filteredContent;
private String badWords;
private String originalContent;
public String getBadWords()
{
return this.badWords;
}
public void setBadWords(String badWords)
{
this.badWords = badWords;
}
public FilteredResult() {}
public FilteredResult(String originalContent, String filteredContent, Integer level, String badWords)
{
this.originalContent = originalContent;
this.filteredContent = filteredContent;
this.level = level;
this.badWords = badWords;
}
public Integer getLevel()
{
return this.level;
}
public void setLevel(Integer level)
{
this.level = level;
}
public String getFilteredContent()
{
return this.filteredContent;
}
public void setFilteredContent(String filteredContent)
{
this.filteredContent = filteredContent;
}
public String getOriginalContent()
{
return this.originalContent;
}
public void setOriginalContent(String originalContent)
{
this.originalContent = originalContent;
}
}
【4】,在web.xml中增加過濾器配置。
<filter>
<filter-name>MySensitiveWordFilter</filter-name>
<filter-class>com.hcyg.media.core.util.MySensitiveWordFilter</filter-class>
</filter>
<filter-mapping>
<filter-name>MySensitiveWordFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
【4 】pom中設定如下,否則 非java的檔案會在打包時被抛棄
<build>
<finalName>media</finalName>
<resources>
<resource>
<directory>src/main/java</directory>
<includes>
<include>**/*.properties</include>
<include>**/*.xml</include>
<include>**/*.txt</include>
</includes>
<!-- 是否替換資源中的屬性 -->
<filtering>false</filtering>
</resource>
<resource>
<directory>src/main/resources</directory>
<includes>
<include>**/*.properties</include>
<include>**/*.xml</include>
</includes>
<filtering>true</filtering>
</resource>
</resources>
...
</build>