天天看點

response中 ,通過過濾器 實作敏感詞過濾

本文是一個不完善的例子,是在請求傳回時增加一個敏感詞過濾器,

之是以說不完善是因為在測試時發現正常的,結構性的部分被過濾掉了,

請将下面的文字放在UltraEdit中比較,會發現,

<img src="/media/resources/images/banner-graphic.png"/>

被替換成了下面的。

<img src="/media/resources/imageX/Xanner-graphic.png"/>

11111111111111111111
4
原始字元串:<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE HTML SYSTEM "about:legacy-compat">
<html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><meta content="IE=8" http-equiv="X-UA-Compatible"/><link href="/media/resources/dijit/themes/tundra/tundra.css" type="text/css" rel="stylesheet"/><link href="/media/resources/styles/standard.css" media="screen" type="text/css" rel="stylesheet"/><link href="/media/resources/images/favicon.ico" rel="SHORTCUT ICON"/><script type="text/javascript">var djConfig = {parseOnLoad: false, isDebug: false, locale: 'zh-cn'};</script><script type="text/javascript" src="/media/resources/dojo/dojo.js"></script><script type="text/javascript" src="/media/resources/spring/Spring.js"></script><script type="text/javascript" src="/media/resources/spring/Spring-Dojo.js"></script><script type="text/javascript" language="JavaScript">dojo.require("dojo.parser");</script><title>Welcome to media</title></head><body class="tundra spring"><div id="wrapper"><div version="2.0" id="header"><a title="Home" name="Home" href="/media/"><img src="/media/resources/images/banner-graphic.png"/></a></div><div version="2.0" id="menu"></div><div id="main"><div version="2.0"><script type="text/javascript">dojo.require('dijit.TitlePane');</script><div id="_title_title_id"><script type="text/javascript">Spring.addDecoration(new Spring.ElementDecoration({elementId : '_title_title_id', widgetType : 'dijit.TitlePane', widgetAttrs : {title: 'Welcome to media', open: true}})); </script><h3>Welcome to media</h3><p>Spring Roo provides interactive, lightweight and user customizable tooling that enables rapid delivery of high performance enterprise Java applications.</p></div></div><div version="2.0" id="footer"><span><a href="/media/">Home</a></span><span id="language"> | Language: <a title="Switch language to English" href="?><img alt="Switch language to English" src="/media/resources/images/en.png" class="flag"/></a> </span><span> | Theme: <a title="standard" href="?theme=standard">standard</a> | <a title="alt" href="?theme=alt">alt</a></span><span><a title="Sponsored by SpringSource" href="http://springsource.com"><img src="/media/resources/images/springsource-logo.png" alt="Sponsored by SpringSource" align="right"/></a></span></div></div></div></body></html>
過濾後的字元串:<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE HTML SYSTEM "about:legacy-compat">
<html><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/><meta content="IE=8" http-equiv="X-UA-Compatible"/><link href="/media/resources/dijit/themes/tundra/tundra.css" type="text/css" rel="stylesheet"/><link href="/media/resources/styles/standard.css" media="screen" type="text/css" rel="stylesheet"/><link href="/media/resources/images/favicon.ico" rel="SHORTCUT ICON"/><script type="text/javascript">var djConfig = {parseOnLoad: false, isDebug: false, locale: 'zh-cn'};</script><script type="text/javascript" src="/media/resources/dojo/dojo.js"></script><script type="text/javascript" src="/media/resources/spring/Spring.js"></script><script type="text/javascript" src="/media/resources/spring/Spring-Dojo.js"></script><script type="text/javascript" language="JavaScript">dojo.require("dojo.parser");</script><title>Welcome to media</title></head><body class="tundra spring"><div id="wrapper"><div version="2.0" id="header"><a title="Home" name="Home" href="/media/"><img src="/media/resources/imageX/Xanner-graphic.png"/></a></div><div version="2.0" id="menu"></div><div id="main"><div version="2.0"><script type="text/javascript">dojo.require('dijit.TitlePane');</script><div id="_title_title_id"><script type="text/javascript">Spring.addDecoration(new Spring.ElementDecoration({elementId : '_title_title_id', widgetType : 'dijit.TitlePane', widgetAttrs : {title: 'Welcome to media', open: true}})); </script><h3>Welcome to media</h3><p>Spring Roo provides interactive, lightweight and user customizable tooling that enables rapid delivery of high performance enterprise Java applicatiXXX.</p></div></div><div version="2.0" id="footer"><span><a href="/media/">Home</a></span><span id="language"> | Language: <a title="Switch language to English" href="?><img alt="Switch language to English" src="/media/resources/images/en.png" class="flag"/></a> </span><span> | Theme: <a title="standard" href="?theme=standard">standard</a> | <a title="alt" href="?theme=alt">alt</a></span><span><a title="SpXXXored by SpringSource" href="http://springsource.com"><img src="/media/resources/images/springsource-logo.png" alt="SpXXXored by SpringSource" align="right"/></a></span></div></div></div></body></html>
敏感詞清單:sb,ons,ons,ons
敏感詞清單長度:14
           

好,下面來說下解決方法:

裡面有四部分需要配置,

1,增加一個MySensitiveWordFilter.java的過濾器,

2,增加個敏感詞詞庫sensitive.txt,(與MySensitiveWordFilter.java同目錄)

3,增加FilteredResult.java儲存過濾情況

4,在web.xml中增加過濾器配置。

5,修改pom設定,防止非java的檔案會在打包時被抛棄。

MySensitiveWordFilter.java和sensitive.txt放置在同一個檔案夾下

【1 】

這裡是定義敏感詞的過濾器

MySensitiveWordFilter.java

package com.hcyg.media.core.util;

import java.io.CharArrayWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;

import com.hcyg.media.core.util.SensitiveWord.FilteredResult;

public class MySensitiveWordFilter implements Filter {
    // private WordFilterUtil wordFilterUtil ;

    private final String ENCODING = null;
    private Node tree = new Node();

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException,
            ServletException {
        PrintWriter out = response.getWriter();
        CharResponseWrapper wrapper = new CharResponseWrapper((HttpServletResponse) response);
        chain.doFilter(request, wrapper);
        String resStr = wrapper.toString();
        FilteredResult res = filterText(resStr, 'X');
        System.out.println("11111111111111111111");
        System.out.println(res.getLevel());// 檢測到的敏感詞中最高優先級的值 0為最小
        System.out.println("原始字元串:"+res.getOriginalContent());// 原始字元串
        System.out.println("過濾後的字元串:"+res.getFilteredContent().toString());// 過濾後的字元串
        System.out.println("敏感詞清單:"+res.getBadWords());// 敏感詞清單
        System.out.println("敏感詞清單長度:"+res.getBadWords().length());// 敏感詞清單長度

        String newStr = res.getFilteredContent();
        out.println(newStr);
    }

    class CharResponseWrapper extends HttpServletResponseWrapper {
        private CharArrayWriter output;

        public String toString() {
            return output.toString();
        }

        public CharResponseWrapper(HttpServletResponse response) {
            super(response);
            output = new CharArrayWriter();
        }

        public PrintWriter getWriter() {
            return new PrintWriter(output);
        }

    }

    public void destroy() {
    }

    /**
     * 初始化時加載配置檔案
     */
    @Override
    public void init(FilterConfig filterConfig) throws ServletException {
        // 讀取檔案
        String app = System.getProperty("user.dir");
        InputStream is = null;
        try {
            // WordFilterUtil.class.getResourceAsStream("/SensitiveWord.txt");

            // InputStreamReader reader = new InputStreamReader(new
            // FileInputStream(file), ENCODING);

            String s_xmlpath = "./sensitive.txt";
            is = MySensitiveWordFilter.class.getResourceAsStream(s_xmlpath);

            InputStreamReader reader = new InputStreamReader(is, "UTF-8");
            Properties prop = new Properties();
            prop.load(reader);
            Enumeration en = prop.propertyNames();

            while (en.hasMoreElements()) {
                String word = (String) en.nextElement();
                insertWord(word, Integer.valueOf(prop.getProperty(word)).intValue());
            }
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
            if (is != null) {
                try {
                    is.close();
                } catch (IOException e1) {
                    e.printStackTrace();
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
            if (is != null) {
                try {
                    is.close();
                } catch (IOException e2) {
                    e.printStackTrace();
                }
            }
        } finally {
            if (is != null) {
                try {
                    is.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    private void insertWord(String word, int level) {
        Node node = tree;
        for (int i = ; i < word.length(); i++) {
            node = node.addChar(word.charAt(i));
        }
        node.setEnd(true);
        node.setLevel(level);
    }

    private boolean isPunctuationChar(String c) {
        String regex = "[\\pP\\pZ\\pS\\pM\\pC]";
        Pattern p = Pattern.compile(regex, );
        Matcher m = p.matcher(c);
        return m.find();
    }

    private PunctuationOrHtmlFilteredResult filterPunctation(String originalString) {
        StringBuffer filteredString = new StringBuffer();
        ArrayList<Integer> charOffsets = new ArrayList();
        for (int i = ; i < originalString.length(); i++) {
            String c = String.valueOf(originalString.charAt(i));
            if (!isPunctuationChar(c)) {
                filteredString.append(c);
                charOffsets.add(Integer.valueOf(i));
            }
        }
        PunctuationOrHtmlFilteredResult result = new PunctuationOrHtmlFilteredResult();
        result.setOriginalString(originalString);
        result.setFilteredString(filteredString);
        result.setCharOffsets(charOffsets);
        return result;
    }

    private PunctuationOrHtmlFilteredResult filterPunctationAndHtml(String originalString) {
        StringBuffer filteredString = new StringBuffer();
        ArrayList<Integer> charOffsets = new ArrayList();
        int i = ;
        for (int k = ; i < originalString.length(); i++) {
            String c = String.valueOf(originalString.charAt(i));
            if (originalString.charAt(i) == '<') {
                for (k = i + ; k < originalString.length(); k++) {
                    if (originalString.charAt(k) == '<') {
                        k = i;
                    } else {
                        if (originalString.charAt(k) == '>') {
                            break;
                        }
                    }
                }
                i = k;
            } else if (!isPunctuationChar(c)) {
                filteredString.append(c);
                charOffsets.add(Integer.valueOf(i));
            }
        }
        PunctuationOrHtmlFilteredResult result = new PunctuationOrHtmlFilteredResult();
        result.setOriginalString(originalString);
        result.setFilteredString(filteredString);
        result.setCharOffsets(charOffsets);
        return result;
    }

    private   FilteredResult filter(PunctuationOrHtmlFilteredResult pohResult, char replacement) {
        StringBuffer sentence = pohResult.getFilteredString();
        ArrayList<Integer> charOffsets = pohResult.getCharOffsets();
        StringBuffer resultString = new StringBuffer(pohResult.getOriginalString());
        StringBuffer badWords = new StringBuffer();
        int level = ;
        Node node = tree;
        int start = ;
        int end = ;
        for (int i = ; i < sentence.length(); i++) {
            start = i;
            end = i;
            node = tree;
            for (int j = i; j < sentence.length(); j++) {
                node = node.findChar(sentence.charAt(j));
                if (node == null) {
                    break;
                }
                if (node.isEnd()) {
                    end = j;
                    level = node.getLevel();
                }
            }
            if (end > start) {
                for (int k = start; k <= end; k++) {
                    resultString.setCharAt(((Integer) charOffsets.get(k)).intValue(), replacement);
                }
                if (badWords.length() > ) {
                    badWords.append(",");
                }
                badWords.append(sentence.substring(start, end + ));
                i = end;
            }
        }
        FilteredResult result = new FilteredResult();
        result.setOriginalContent(pohResult.getOriginalString());
        result.setFilteredContent(resultString.toString());
        result.setBadWords(badWords.toString());
        result.setLevel(Integer.valueOf(level));
        return result;
    }

    public   String simpleFilter(String sentence, char replacement) {
        StringBuffer sb = new StringBuffer();
        Node node = tree;
        int start = ;
        int end = ;
        for (int i = ; i < sentence.length(); i++) {
            start = i;
            end = i;
            node = tree;
            for (int j = i; j < sentence.length(); j++) {
                node = node.findChar(sentence.charAt(j));
                if (node == null) {
                    break;
                }
                if (node.isEnd()) {
                    end = j;
                }
            }
            if (end > start) {
                for (int k = start; k <= end; k++) {
                    sb.append(replacement);
                }
                i = end;
            } else {
                sb.append(sentence.charAt(i));
            }
        }
        return sb.toString();
    }

    public FilteredResult filterText(String originalString, char replacement) {
        return filter(filterPunctation(originalString), replacement);
    }

    public FilteredResult filterHtml(String originalString, char replacement) {
        return filter(filterPunctationAndHtml(originalString), replacement);
    }

    private class PunctuationOrHtmlFilteredResult {
        private String originalString;
        private StringBuffer filteredString;
        private ArrayList<Integer> charOffsets;

        public String getOriginalString() {
            return this.originalString;
        }

        public void setOriginalString(String originalString) {
            this.originalString = originalString;
        }

        public StringBuffer getFilteredString() {
            return this.filteredString;
        }

        public void setFilteredString(StringBuffer filteredString) {
            this.filteredString = filteredString;
        }

        public ArrayList<Integer> getCharOffsets() {
            return this.charOffsets;
        }

        public void setCharOffsets(ArrayList<Integer> charOffsets) {
            this.charOffsets = charOffsets;
        }
    }

    class Node {
        private Map<String, Node> children = new HashMap();
        private boolean isEnd = false;
        private int level = ;

        public Node addChar(char c) {
            String cStr = String.valueOf(c);
            Node node = (Node) this.children.get(cStr);
            if (node == null) {
                node = new Node();
                this.children.put(cStr, node);
            }
            return node;
        }

        public Node findChar(char c) {
            String cStr = String.valueOf(c);
            return (Node) this.children.get(cStr);
        }

        public boolean isEnd() {
            return this.isEnd;
        }

        public void setEnd(boolean isEnd) {
            this.isEnd = isEnd;
        }

        public int getLevel() {
            return this.level;
        }

        public void setLevel(int level) {
            this.level = level;
        }
    }

}
           

【2 】

這裡是敏感詞庫,大家去搜尋下吧,找個相同結構的就行

sensitive.txt

加qq=4
敏感詞=4
           

【3】FilteredResult過濾結果

/**
 *@Copyright:Copyright (c) 2008 - 2100
 *@Company:hcyg
 */
package com.hcyg.media.core.util.SensitiveWord;


/**
 *@Title:FilteredResult
 *@Description:
 *@Author:zp
 *@Since:2015-8-1
 *@Version:1.0.0
 */ 
public class FilteredResult
{
  private Integer level;
  private String filteredContent;
  private String badWords;
  private String originalContent;

  public String getBadWords()
  {
    return this.badWords;
  }

  public void setBadWords(String badWords)
  {
    this.badWords = badWords;
  }

  public FilteredResult() {}

  public FilteredResult(String originalContent, String filteredContent, Integer level, String badWords)
  {
    this.originalContent = originalContent;
    this.filteredContent = filteredContent;
    this.level = level;
    this.badWords = badWords;
  }

  public Integer getLevel()
  {
    return this.level;
  }

  public void setLevel(Integer level)
  {
    this.level = level;
  }

  public String getFilteredContent()
  {
    return this.filteredContent;
  }

  public void setFilteredContent(String filteredContent)
  {
    this.filteredContent = filteredContent;
  }

  public String getOriginalContent()
  {
    return this.originalContent;
  }

  public void setOriginalContent(String originalContent)
  {
    this.originalContent = originalContent;
  }
}
           

【4】,在web.xml中增加過濾器配置。

<filter>
        <filter-name>MySensitiveWordFilter</filter-name>
        <filter-class>com.hcyg.media.core.util.MySensitiveWordFilter</filter-class>

    </filter>

    <filter-mapping>
        <filter-name>MySensitiveWordFilter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>
           

【4 】pom中設定如下,否則 非java的檔案會在打包時被抛棄

<build>
        <finalName>media</finalName>
        <resources>
            <resource>
                <directory>src/main/java</directory>
                <includes>
                    <include>**/*.properties</include>
                    <include>**/*.xml</include>
                    <include>**/*.txt</include>
                </includes>
                <!-- 是否替換資源中的屬性 -->
                <filtering>false</filtering>
            </resource>
            <resource>
                <directory>src/main/resources</directory>
                <includes>
                    <include>**/*.properties</include>
                    <include>**/*.xml</include>
                </includes>
                <filtering>true</filtering>
            </resource>
        </resources>
        ...
    </build>