发行的软件许多都带有一组html帮助文件。直接在浏览器查看帮助的时候就会发现,最大的问题就是没有全文搜索功能。也就是说,除非你一开始就知道你想看 内容在哪个网页上,否则你很难找到对应的内容。
CHM(Compressed HTML Help)是Microsoft从1998年推出的帮助文件规范,由于后继的开发被中断,可用版本只到1.3,不过主要的打包、压缩、内容目录、索引树、全文搜索功能都已经有了。
CHM(Compressed HTML Help)的两进制结构、配置文件内容说明可以参考:
http://www.nongnu.org/chmspec/latest/index.html
由html生成chm文件至少需要 三个配置文件(.hhp、.hhc、.hhk),然后使用 HTML Help Workshop 编译。
网上找了几个导出html帮助目录到chm文件的工具,对最后显示的标题不太满意。而且生成的索引是html文件名,基本没用,所以有了自己做一个工具的念头。
个人习惯做小工具首先Perl,不喜欢Perl的人 可以看下实现思路,自己也写一把代码。
运行系统
Windows。不过其它平台也有chm查看工具。
运行环境
Perl Windows下可以安装ActivePerl或Strawberry Perl,都是免费软件。
HTML Help Workshop 一般来说,安装了某个版本的Visual Studio时也会同时在“C:/Program Files”下安装 HTML Help Workshop,如果你发现你的机器没有安装这个软件,google一下,有直接安装的版本。
1. 封装为bat文件
因为预期是在windows下使用,封装为bat文件方便右键菜单中调用,非Windows用户可以跳过这部分。
@rem = '
perl %0 %*
goto :end
';
# Real perl code is from here
__END__
:end
pause
@rem 在bat中解释为隐蔽后面命令与注释命令,在Perl中解释为一个列表赋值(1~4行)。注意不要在第一行前面插入其它Perl代码,如 #!、use struct 的,不然会报错。
第二行调用perl.exe执行真正的操作,“%0 %*”表示脚本文件名与传给脚本的参数。
第三行跑到文件尾。由于bat只在执行时检查语法,这一行到:end所在行之间的内容可以使用不同于bat 的语法(这里是Perl)。
最后的pause是为了能看一下命令行输出,不需要为话可以在前面加rem屏蔽。
2. 使用html标题做chm标题树节点的名称
通过HTML::TreeBuilder与HTML::Element提取html文件的title标签内容:
my $tree = HTML::TreeBuilder->new_from_file($file);
return $default unless defined $tree;
my $title = $tree->find("title");
($title = $title->as_text()) =~ s/^/s+// if $title;
$tree->delete();
第二行说明如果解析文件不是html文件,它返给该子过程的默认标题。
第五行取出title标签文件内容并清除前后缀的空格。
3. 关联目录与索引页
html帮助系统的入口一般都是index.html。另外 一些特殊的帮助系统中一些目录与同名的html文件相关联。我们使用这些关联html文件替换目录显示在chm标题树上。
my @suffix = ('.htm', '.html', '/index.htm', '/index.html' );
foreach my $suffix (@suffix) {
my $path = "$dir$suffix";
return $path if -f $path;
}
这段代码按照与目录同名的同级目录下的html文件、本目录下的的iindex.html文件查找关联的html文件。
不使用glob与File::Find的原因主要是帮助系统中可以有大量目录与文件,而频繁调用这些查找函数有时产生莫名奇妙的错误。
4. 把anchor标签内容加入索引
一个对象或一个小函数包很可能 输出在一个html文件,所以筛选出html的anchor(以"#anchor-name"格式定义)的内容写入索引,可以直接以方法名索引到对应说明。
my $tree = HTML::TreeBuilder->new_from_file($file);
return unless defined $tree;
my @anchor = $tree->look_down("href", qr/^#/w+/);
$tree->delete(), return unless @anchor;
my %anchor;
foreach my $anchor (@anchor) {
my $href = $anchor->attr("href");
my $href_title = $anchor->as_text();
$href_title =~ s/~/s+//;
next unless $href && $href_title;
next if $anchor{$href} && $href ne "#$href_title";
$anchor{$href} = $href_title;
}
print HHK hhk_sub_in($pad);
my $pad2 = "$pad/t";
foreach my $href (keys %anchor) {
print HHK hhk_page($file.$href, $anchor{$href}, $pad2);
}
print HHK hhk_sub_out($pad);
foreach my $href (keys %anchor) {
print HHK hhk_page($file.$href, $anchor{$href}, $pad2);
}
$tree->delete();
利用HTML::Element的look_down方法可以的过滤出html文件中的anchor。
chm的索引是自动排序的,你不会喜欢看到连在一起一堆标题类似甚至相同的anchor索引,所以我对anchor进行了过滤。
注意这里插入了两次anchor到索引中,这是因为chm索引实际是树结构(常见当作列表使用),一次插入到顶层索引,支持按方法直接查找;一次插入到原始文件的子索引,实现逐级查找。
5. chm标题树不区分目录与节点、忽略大小写、按字母顺序排序。
这是针对一个帮助系统目录、文件众多的情况设定的。虽然看上去可能有点难看,不过程序做不到自动识别各个目录与网页的逻辑关系,那么按字母顺序就是最容易查找到内容的方案。
6. 自定义chm主窗体
如果需要激活全部的工具栏按钮与导航面板,就需要使用自定义Window了。
在hhp文件[OPTIONS]中插入
Default Window=Window-name
在[WINDOWS]区插入:
main="framework","index.hhc","index.hhk","index.html","index.html",,,,,0x23520,,0x10387e,,,,,,,,0
其中各个参数的意义可以参照: http://www.nongnu.org/chmspec/latest/INI.html#HHP
完整源代码如下:
<textarea cols="50" rows="15" name="code" class="python">@rem = '
perl %0 %*
goto :end
';
#
# This file is used for export a directory into a chm file.
#
use strict;
use HTML::TreeBuilder;
my $dir = $ARGV[0];
$dir =~ s/^.*[]//;
chdir($&);
my $homepage = get_homepage($dir);
my $prjname = get_htmltitle($homepage, $dir);
print qq(Now building Compressed HTML Help project for dir "$dir" as "$prjname".../n);
open HHP, ">$prjname.hhp" or die "Failed to open >$prjname.hhp cause $!";
open HHC, ">$prjname.hhc" or close HHP, die "Failed to open >$prjname.hhp cause $!";
open HHK, ">$prjname.hhk" or close HHP, close HHC, die "Failed to open >$prjname.hhp cause $!";
print HHP hhp_head();
print HHC hhc_head();
print HHK hhk_head();
proc_dir($dir);
print HHP hhp_end();
print HHC hhc_end();
print HHK hhk_end();
close HHP;
close HHC;
close HHK;
print qq(Compressed HTML Help project for dir "$dir" as "$prjname" built./n);
system('"C:/Program Files/HTML Help Workshop/hhc.exe" '.qq("$prjname.hhp"));
exit;
sub get_homepage($) {
my ($dir) = @_;
return unless $dir;
my @suffix = ('.htm', '.html', '/index.htm', '/index.html' );
foreach my $suffix (@suffix) {
my $path = "$dir$suffix";
return $path if -f $path;
}
return;
}
sub get_htmltitle($;$) {
my ($file, $default) = @_;
return $default unless $file;
my $tree = HTML::TreeBuilder->new_from_file($file);
return $default unless defined $tree;
my $title = $tree->find("title");
($title = $title->as_text()) =~ s/^/s+// if $title;
$tree->delete();
return $title? $title: $default;
}
sub file_need_display($) {
my ($name) = @_;
return $name =~ //.html?$/i or $name !~ /[^/w/s]/;
}
sub proc_file($$;$) {
my ($file, $title, $pad) = @_;
return unless -f $file;
print qq(Processing "$file" as "$title"/n);
print HHC hhc_page($file, $title, $pad);
print HHK hhk_page($file, $title, $pad);
my $tree = HTML::TreeBuilder->new_from_file($file);
return unless defined $tree;
my @anchor = $tree->look_down("href", qr/^#/w+/);
$tree->delete(), return unless @anchor;
my %anchor;
foreach my $anchor (@anchor) {
my $href = $anchor->attr("href");
my $href_title = $anchor->as_text();
$href_title =~ s/~/s+//;
next unless $href && $href_title;
next if $anchor{$href} && $href ne "#$href_title";
$anchor{$href} = $href_title;
}
print HHK hhk_sub_in($pad);
my $pad2 = "$pad/t";
foreach my $href (keys %anchor) {
print HHK hhk_page($file.$href, $anchor{$href}, $pad2);
}
print HHK hhk_sub_out($pad);
foreach my $href (keys %anchor) {
print HHK hhk_page($file.$href, $anchor{$href}, $pad2);
}
$tree->delete();
return;
}
sub proc_dir($;$$$) {
my ($dir, $page, $title, $pad) = @_;
return unless -d $dir;
print qq(Processing "$dir" as "$title" with homepage of "$page"/n);
opendir(my $dir_h, $dir) or warn "Can't open dir $dir: $!/n";
print HHC hhc_dir_in($page, $title, $pad) if defined $title;
my (%item, %used);
while (readdir $dir_h) {
next if /^[.]/; # Not process pseudo or hide dir and file.
my $path = "$dir/$_";
if (-f $path) {
print HHP "$path/n";
if (file_need_display($_) && not $used{$path}) {
my $item_title = get_htmltitle($path, $_);
next if $title eq $item_title;
$item{$item_title} = { file => $path };
}
}
elsif (-d $path){
my $homepage = get_homepage($path);
if ($homepage) {
my $item_title = get_htmltitle($homepage, $_);
$item{$item_title} = { dir => $path, page => $homepage };
$used{$homepage} = $path;
}
else {
$item{$_} = { dir => $path };
}
}
else {
print "Should not go in here: $path/n";
}
}
closedir $dir_h;
my $pad2 = "$pad/t";
foreach my $item_title (sort { lc $a cmp lc $b } keys %item) {
my $item = $item{$item_title};
if ($item->{dir}) {
proc_dir($item->{dir}, $item->{page}, $item_title, $pad2);
}
else {
proc_file($item->{file}, $item_title, $pad2);
}
}
print HHC hhc_dir_out($pad) if defined $title;
return;
}
sub hhc_page {
my ($path, $title, $pad) = @_;
return unless $title;
my $item = qq($pad<LI><OBJECT type="text/sitemap">)
.qq(<param name="Name" value="$title">)
.qq(<param name="Local" value="$path">)
.qq(<param name="ImageNumber" value="11">)
.qq(</OBJECT>/n);
return $item;
}
sub hhc_dir_in {
my ($path, $title, $pad) = @_;
return unless $title;
my $item = qq($pad<LI><OBJECT type="text/sitemap">)
.qq(<param name="Name" value="$title">)
.qq(<param name="Local" value="$path">)
.qq(<param name="ImageNumber" value="1">)
.qq(</OBJECT>/n)
.qq($pad<UL>/n);
return $item;
}
sub hhc_dir_out {
my ($pad) = @_;
return "$pad</UL>/n";
}
sub hhk_page {
my ($path, $title, $pad, @keyword) = @_;
return unless $title;
my $item = qq($pad<LI><OBJECT type="text/sitemap">)
.qq(<param name="Name" value="$title">)
.join('', map { qq(<param name="Name" value="$_">) } @keyword)
.qq(<param name="Local" value="$path">)
.qq(</OBJECT>/n);
return $item;
}
sub hhk_sub_in {
my ($pad) = @_;
return "$pad<UL>/n";
}
sub hhk_sub_out {
my ($pad) = @_;
return "$pad</UL>/n";
}
sub hhp_head{
my $title = $dir;
$title = get_htmltitle($homepage) if defined $homepage;
my %btn = (
'hide' => 0x00000002, # Hide/Show button hides/shows the navigation pane.
'back' => 0x00000004, # Back button.
'forward' => 0x00000008, # Forward button.
'stop' => 0x00000010, # Stop button.
'refresh' => 0x00000020, # Refresh button.
'home' => 0x00000040, # Home button.
'locale' => 0x00000800, # Locate button. Jumps to the current topic in the contents pane.
'options' => 0x00001000, # Options button.
'print' => 0x00002000, # Print button.
'jump1' => 0x00040000, # Jump 1 button. Customisable text - Arg 7. Either
'jump2' => 0x00080000, # Jump 2 button. Customisable text - Arg 9. Either
'font' => 0x00100000, # Font button. Changes the size of the text shown in the IE HTML display pane.
'next' => 0x00200000, # Next button. Jumps to the next topic in the contents pane. Requires "Binary TOC=Yes".
'prev' => 0x00400000 # Previous button. Jumps to the previous topic in the contents pane. Requires "Binary TOC=Yes".
);
my %style = (
'autohide' => 0x00000001, # Automatically hide/show tri-pane window: when the help window has focus the navigation pane is visible, otherwise it is hidden. Off
'ontop' => 0x00000002, # Keep the help window on top. Off
'notitlebar' => 0x00000004, # No title bar Off
'tri-pane' => 0x00000020, # Use a tri-pane window On
'hide toolbar text' => 0x00000040, # No text on toolbar buttons On
'post WM_QUIT' => 0x00000080, # Post WM_QUIT message when window closes Off
'sync topic' => 0x00000100, # When the current topic changes automatically sync contents and index. On
'search' => 0x00000400, # Include search tab in navigation pane On
'history' => 0x00000800, # Include history tab in navigation pane Off
'favorites' => 0x00001000, # Include favorites tab in navigation pane On
'merge title' => 0x00002000, # Put current HTML title in title bar On
'hide toolbar' => 0x00008000, # Don't display a toolbar Off
'msdnmenu' => 0x00010000, # MSDN Menu Off
'fts ui' => 0x00020000, # Advanced FTS UI. On
'allow resize' => 0x00040000, # After initial creation, user controls window size/position On
'has margin' => 0x10000000 # The window type has a margin
);
my @mainw = (
qq("$title"), #1 The title bar text.
qq("$prjname.hhc"), #2 The TOC file.
qq("$prjname.hhk"), #3 The Index file.
qq("$homepage"), #4 The Default file.
qq("$homepage"), #5 The file shown when the Home button is pressed.
undef, #6 The file shown when the Jump 1 button is pressed.
undef, #7 The text of the Jump 1 button.
undef, #8 The file shown when the Jump 2 button is pressed.
undef, #9 The text of the Jump 2 button.
#10 A bit feild of navigation pane styles.
$style{'tri-pane'}|$style{'sync topic'}|$style{search}|$style{history}|$style{favorites}|$style{'merge title'}|$style{'fts ui'}|$style{'allow resize'}|$style{'hass margin'},
undef, #11 Width of the navigation pane in pixels.
#12 A bit field of the buttons to show.
$btn{hide}|$btn{back}|$btn{forward}|$btn{stop}|$btn{home}|$btn{locale}|$btn{prev}|$btn{next}|$btn{option}|$btn{font}|$btn{print},
undef, #13 Initial position of the window on the screen: [left, top, right, bottom].
undef, #14 Style Flags. As set in the Win32 SetWindowLong & CreateWindow APIs.
undef, #15 Extended Style Flags. As set in the Win32 SetWindowLong & CreateWindowEx APIs.
undef, #16 Window show state. As set in the Win32 ShowWindow API.
1, #17 Whether or not the navigation pane is initially closed. 1 = closed, 0 = open
0, #18 The default navigation pane. 0 = TOC, 1 = Index, 2 = Search, 3 = Favorites, 4 = History (not implemented by HH), 5 = Author, 11-19 = Custom panes.
0, #19 Where the navigation pane tabs should be. 0 = Top, 1 = Left, 2 = Bottom & anything else is invalid (I imagine 3 = Right).
0 #20 ID to send in WM_NOTIFY messages.
);
my $mainw = join ',', @mainw;
return << "END-OF-HHP";
[OPTIONS]
Binary TOC=Yes
Compatibility=1.1 or later
Compiled file=$prjname.chm
Contents file=$prjname.hhc
Default Window=main
Default topic=$homepage
Display compile progress=Yes
Error log file=$prjname.log
Full-text search=Yes
Index file=$prjname.hhk
Title=$title
[WINDOWS]
main=$mainw
[FILES]
END-OF-HHP
}
sub hhp_end {
return << "END-OF-HHP";
[INFOTYPES]
END-OF-HHP
}
sub hhc_head {
return << "END-OF-HHC";
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
<meta name="GENERATOR" content="Microsoft® HTML Help Workshop 4.1">
<!-- Sitemap 1.0 -->
</HEAD>
<BODY>
<OBJECT type="text/site properties">
<param name="ImageType" value="Folder">
</OBJECT>
<UL>
END-OF-HHC
}
sub hhc_end{
return << "END-OF-HHC";
</UL>
</BODY>
</HTML>
END-OF-HHC
}
sub hhk_head {
return << "END-OF-HHK";
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<HTML>
<HEAD>
<meta name="GENERATOR" content="Microsoft® HTML Help Workshop 4.1">
<!-- Sitemap 1.0 -->
</HEAD>
<BODY>
<UL>
END-OF-HHK
}
sub hhk_end {
return << "END-OF-HHK";
</UL>
</BODY>
</HTML>
END-OF-HHK
}
__END__
:end
pause