天天看点

导出html帮助目录为chm文件

发行的软件许多都带有一组html帮助文件。直接在浏览器查看帮助的时候就会发现,最大的问题就是没有全文搜索功能。也就是说,除非你一开始就知道你想看 内容在哪个网页上,否则你很难找到对应的内容。

CHM(Compressed HTML Help)是Microsoft从1998年推出的帮助文件规范,由于后继的开发被中断,可用版本只到1.3,不过主要的打包、压缩、内容目录、索引树、全文搜索功能都已经有了。

CHM(Compressed HTML Help)的两进制结构、配置文件内容说明可以参考:

http://www.nongnu.org/chmspec/latest/index.html

由html生成chm文件至少需要 三个配置文件(.hhp、.hhc、.hhk),然后使用 HTML Help Workshop 编译。

网上找了几个导出html帮助目录到chm文件的工具,对最后显示的标题不太满意。而且生成的索引是html文件名,基本没用,所以有了自己做一个工具的念头。

个人习惯做小工具首先Perl,不喜欢Perl的人 可以看下实现思路,自己也写一把代码。

运行系统

Windows。不过其它平台也有chm查看工具。

运行环境

Perl Windows下可以安装ActivePerl或Strawberry Perl,都是免费软件。

HTML Help Workshop 一般来说,安装了某个版本的Visual Studio时也会同时在“C:/Program Files”下安装 HTML Help Workshop,如果你发现你的机器没有安装这个软件,google一下,有直接安装的版本。

1. 封装为bat文件

因为预期是在windows下使用,封装为bat文件方便右键菜单中调用,非Windows用户可以跳过这部分。

@rem = '

perl %0 %*

goto :end

';

# Real perl code is from here

__END__

:end

pause

@rem 在bat中解释为隐蔽后面命令与注释命令,在Perl中解释为一个列表赋值(1~4行)。注意不要在第一行前面插入其它Perl代码,如 #!、use struct 的,不然会报错。

第二行调用perl.exe执行真正的操作,“%0 %*”表示脚本文件名与传给脚本的参数。

第三行跑到文件尾。由于bat只在执行时检查语法,这一行到:end所在行之间的内容可以使用不同于bat 的语法(这里是Perl)。

最后的pause是为了能看一下命令行输出,不需要为话可以在前面加rem屏蔽。

2. 使用html标题做chm标题树节点的名称

通过HTML::TreeBuilder与HTML::Element提取html文件的title标签内容:

 my $tree = HTML::TreeBuilder->new_from_file($file);

return $default unless defined $tree;

my $title = $tree->find("title");

($title = $title->as_text()) =~ s/^/s+// if $title;

$tree->delete();

第二行说明如果解析文件不是html文件,它返给该子过程的默认标题。

第五行取出title标签文件内容并清除前后缀的空格。

3. 关联目录与索引页

html帮助系统的入口一般都是index.html。另外 一些特殊的帮助系统中一些目录与同名的html文件相关联。我们使用这些关联html文件替换目录显示在chm标题树上。

my @suffix = ('.htm', '.html', '/index.htm', '/index.html' );

foreach my $suffix (@suffix) {

my $path = "$dir$suffix";

return $path if -f $path;

}

这段代码按照与目录同名的同级目录下的html文件、本目录下的的iindex.html文件查找关联的html文件。

不使用glob与File::Find的原因主要是帮助系统中可以有大量目录与文件,而频繁调用这些查找函数有时产生莫名奇妙的错误。

4. 把anchor标签内容加入索引

一个对象或一个小函数包很可能 输出在一个html文件,所以筛选出html的anchor(以"#anchor-name"格式定义)的内容写入索引,可以直接以方法名索引到对应说明。

 my $tree = HTML::TreeBuilder->new_from_file($file);

return unless defined $tree;

my @anchor = $tree->look_down("href", qr/^#/w+/);

$tree->delete(), return unless @anchor;

my %anchor;

foreach my $anchor (@anchor) {

my $href = $anchor->attr("href");

my $href_title = $anchor->as_text();

$href_title =~ s/~/s+//;

next unless $href && $href_title;

next if $anchor{$href} && $href ne "#$href_title";

$anchor{$href} = $href_title;

}

print HHK hhk_sub_in($pad);

my $pad2 = "$pad/t";

foreach my $href (keys %anchor) {

print HHK hhk_page($file.$href, $anchor{$href}, $pad2);

}

print HHK hhk_sub_out($pad);

foreach my $href (keys %anchor) {

print HHK hhk_page($file.$href, $anchor{$href}, $pad2);

}

$tree->delete();

利用HTML::Element的look_down方法可以的过滤出html文件中的anchor。

chm的索引是自动排序的,你不会喜欢看到连在一起一堆标题类似甚至相同的anchor索引,所以我对anchor进行了过滤。

注意这里插入了两次anchor到索引中,这是因为chm索引实际是树结构(常见当作列表使用),一次插入到顶层索引,支持按方法直接查找;一次插入到原始文件的子索引,实现逐级查找。

 5. chm标题树不区分目录与节点、忽略大小写、按字母顺序排序。

 这是针对一个帮助系统目录、文件众多的情况设定的。虽然看上去可能有点难看,不过程序做不到自动识别各个目录与网页的逻辑关系,那么按字母顺序就是最容易查找到内容的方案。

 6. 自定义chm主窗体

如果需要激活全部的工具栏按钮与导航面板,就需要使用自定义Window了。

在hhp文件[OPTIONS]中插入

Default Window=Window-name

在[WINDOWS]区插入:

main="framework","index.hhc","index.hhk","index.html","index.html",,,,,0x23520,,0x10387e,,,,,,,,0

其中各个参数的意义可以参照: http://www.nongnu.org/chmspec/latest/INI.html#HHP

完整源代码如下:

<textarea cols="50" rows="15" name="code" class="python">@rem = '

perl %0 %*

goto :end

';

#

# This file is used for export a directory into a chm file.

#

use strict;

use HTML::TreeBuilder;

my $dir = $ARGV[0];

$dir =~ s/^.*[]//;

chdir($&);

my $homepage = get_homepage($dir);

my $prjname = get_htmltitle($homepage, $dir);

print qq(Now building Compressed HTML Help project for dir "$dir" as "$prjname".../n);

open HHP, ">$prjname.hhp" or die "Failed to open >$prjname.hhp cause $!";

open HHC, ">$prjname.hhc" or close HHP, die "Failed to open >$prjname.hhp cause $!";

open HHK, ">$prjname.hhk" or close HHP, close HHC, die "Failed to open >$prjname.hhp cause $!";

print HHP hhp_head();

print HHC hhc_head();

print HHK hhk_head();

proc_dir($dir);

print HHP hhp_end();

print HHC hhc_end();

print HHK hhk_end();

close HHP;

close HHC;

close HHK;

print qq(Compressed HTML Help project for dir "$dir" as "$prjname" built./n);

system('"C:/Program Files/HTML Help Workshop/hhc.exe" '.qq("$prjname.hhp"));

exit;

sub get_homepage($) {

my ($dir) = @_;

return unless $dir;

my @suffix = ('.htm', '.html', '/index.htm', '/index.html' );

foreach my $suffix (@suffix) {

my $path = "$dir$suffix";

return $path if -f $path;

}

return;

}

sub get_htmltitle($;$) {

my ($file, $default) = @_;

return $default unless $file;

my $tree = HTML::TreeBuilder->new_from_file($file);

return $default unless defined $tree;

my $title = $tree->find("title");

($title = $title->as_text()) =~ s/^/s+// if $title;

$tree->delete();

return $title? $title: $default;

}

sub file_need_display($) {

my ($name) = @_;

return $name =~ //.html?$/i or $name !~ /[^/w/s]/;

}

sub proc_file($$;$) {

my ($file, $title, $pad) = @_;

return unless -f $file;

print qq(Processing "$file" as "$title"/n);

print HHC hhc_page($file, $title, $pad);

print HHK hhk_page($file, $title, $pad);

my $tree = HTML::TreeBuilder->new_from_file($file);

return unless defined $tree;

my @anchor = $tree->look_down("href", qr/^#/w+/);

$tree->delete(), return unless @anchor;

my %anchor;

foreach my $anchor (@anchor) {

my $href = $anchor->attr("href");

my $href_title = $anchor->as_text();

$href_title =~ s/~/s+//;

next unless $href && $href_title;

next if $anchor{$href} && $href ne "#$href_title";

$anchor{$href} = $href_title;

}

print HHK hhk_sub_in($pad);

my $pad2 = "$pad/t";

foreach my $href (keys %anchor) {

print HHK hhk_page($file.$href, $anchor{$href}, $pad2);

}

print HHK hhk_sub_out($pad);

foreach my $href (keys %anchor) {

print HHK hhk_page($file.$href, $anchor{$href}, $pad2);

}

$tree->delete();

return;

}

sub proc_dir($;$$$) {

my ($dir, $page, $title, $pad) = @_;

return unless -d $dir;

print qq(Processing "$dir" as "$title" with homepage of "$page"/n);

opendir(my $dir_h, $dir) or warn "Can't open dir $dir: $!/n";

print HHC hhc_dir_in($page, $title, $pad) if defined $title;

my (%item, %used);

while (readdir $dir_h) {

next if /^[.]/; # Not process pseudo or hide dir and file.

my $path = "$dir/$_";

if (-f $path) {

print HHP "$path/n";

if (file_need_display($_) && not $used{$path}) {

my $item_title = get_htmltitle($path, $_);

next if $title eq $item_title;

$item{$item_title} = { file => $path };

}

}

elsif (-d $path){

my $homepage = get_homepage($path);

if ($homepage) {

my $item_title = get_htmltitle($homepage, $_);

$item{$item_title} = { dir => $path, page => $homepage };

$used{$homepage} = $path;

}

else {

$item{$_} = { dir => $path };

}

}

else {

print "Should not go in here: $path/n";

}

}

closedir $dir_h;

my $pad2 = "$pad/t";

foreach my $item_title (sort { lc $a cmp lc $b } keys %item) {

my $item = $item{$item_title};

if ($item->{dir}) {

proc_dir($item->{dir}, $item->{page}, $item_title, $pad2);

}

else {

proc_file($item->{file}, $item_title, $pad2);

}

}

print HHC hhc_dir_out($pad) if defined $title;

return;

}

sub hhc_page {

my ($path, $title, $pad) = @_;

return unless $title;

my $item = qq($pad<LI><OBJECT type="text/sitemap">)

.qq(<param name="Name" value="$title">)

.qq(<param name="Local" value="$path">)

.qq(<param name="ImageNumber" value="11">)

.qq(</OBJECT>/n);

return $item;

}

sub hhc_dir_in {

my ($path, $title, $pad) = @_;

return unless $title;

my $item = qq($pad<LI><OBJECT type="text/sitemap">)

.qq(<param name="Name" value="$title">)

.qq(<param name="Local" value="$path">)

.qq(<param name="ImageNumber" value="1">)

.qq(</OBJECT>/n)

.qq($pad<UL>/n);

return $item;

}

sub hhc_dir_out {

my ($pad) = @_;

return "$pad</UL>/n";

}

sub hhk_page {

my ($path, $title, $pad, @keyword) = @_;

return unless $title;

my $item = qq($pad<LI><OBJECT type="text/sitemap">)

.qq(<param name="Name" value="$title">)

.join('', map { qq(<param name="Name" value="$_">) } @keyword)

.qq(<param name="Local" value="$path">)

.qq(</OBJECT>/n);

return $item;

}

sub hhk_sub_in {

my ($pad) = @_;

return "$pad<UL>/n";

}

sub hhk_sub_out {

my ($pad) = @_;

return "$pad</UL>/n";

}

sub hhp_head{

my $title = $dir;

$title = get_htmltitle($homepage) if defined $homepage;

my %btn = (

'hide' => 0x00000002, # Hide/Show button hides/shows the navigation pane.

'back' => 0x00000004, # Back button.

'forward' => 0x00000008, # Forward button.

'stop' => 0x00000010, # Stop button.

'refresh' => 0x00000020, # Refresh button.

'home' => 0x00000040, # Home button.

'locale' => 0x00000800, # Locate button. Jumps to the current topic in the contents pane.

'options' => 0x00001000, # Options button.

'print' => 0x00002000, # Print button.

'jump1' => 0x00040000, # Jump 1 button. Customisable text - Arg 7. Either

'jump2' => 0x00080000, # Jump 2 button. Customisable text - Arg 9. Either

'font' => 0x00100000, # Font button. Changes the size of the text shown in the IE HTML display pane.

'next' => 0x00200000, # Next button. Jumps to the next topic in the contents pane. Requires "Binary TOC=Yes".

'prev' => 0x00400000 # Previous button. Jumps to the previous topic in the contents pane. Requires "Binary TOC=Yes".

);

my %style = (

'autohide' => 0x00000001, # Automatically hide/show tri-pane window: when the help window has focus the navigation pane is visible, otherwise it is hidden. Off

'ontop' => 0x00000002, # Keep the help window on top. Off

'notitlebar' => 0x00000004, # No title bar Off

'tri-pane' => 0x00000020, # Use a tri-pane window On

'hide toolbar text' => 0x00000040, # No text on toolbar buttons On

'post WM_QUIT' => 0x00000080, # Post WM_QUIT message when window closes Off

'sync topic' => 0x00000100, # When the current topic changes automatically sync contents and index. On

'search' => 0x00000400, # Include search tab in navigation pane On

'history' => 0x00000800, # Include history tab in navigation pane Off

'favorites' => 0x00001000, # Include favorites tab in navigation pane On

'merge title' => 0x00002000, # Put current HTML title in title bar On

'hide toolbar' => 0x00008000, # Don't display a toolbar Off

'msdnmenu' => 0x00010000, # MSDN Menu Off

'fts ui' => 0x00020000, # Advanced FTS UI. On

'allow resize' => 0x00040000, # After initial creation, user controls window size/position On

'has margin' => 0x10000000 # The window type has a margin

);

my @mainw = (

qq("$title"), #1 The title bar text.

qq("$prjname.hhc"), #2 The TOC file.

qq("$prjname.hhk"), #3 The Index file.

qq("$homepage"), #4 The Default file.

qq("$homepage"), #5 The file shown when the Home button is pressed.

undef, #6 The file shown when the Jump 1 button is pressed.

undef, #7 The text of the Jump 1 button.

undef, #8 The file shown when the Jump 2 button is pressed.

undef, #9 The text of the Jump 2 button.

#10 A bit feild of navigation pane styles.

$style{'tri-pane'}|$style{'sync topic'}|$style{search}|$style{history}|$style{favorites}|$style{'merge title'}|$style{'fts ui'}|$style{'allow resize'}|$style{'hass margin'},

undef, #11 Width of the navigation pane in pixels.

#12 A bit field of the buttons to show.

$btn{hide}|$btn{back}|$btn{forward}|$btn{stop}|$btn{home}|$btn{locale}|$btn{prev}|$btn{next}|$btn{option}|$btn{font}|$btn{print},

undef, #13 Initial position of the window on the screen: [left, top, right, bottom].

undef, #14 Style Flags. As set in the Win32 SetWindowLong & CreateWindow APIs.

undef, #15 Extended Style Flags. As set in the Win32 SetWindowLong & CreateWindowEx APIs.

undef, #16 Window show state. As set in the Win32 ShowWindow API.

1, #17 Whether or not the navigation pane is initially closed. 1 = closed, 0 = open

0, #18 The default navigation pane. 0 = TOC, 1 = Index, 2 = Search, 3 = Favorites, 4 = History (not implemented by HH), 5 = Author, 11-19 = Custom panes.

0, #19 Where the navigation pane tabs should be. 0 = Top, 1 = Left, 2 = Bottom & anything else is invalid (I imagine 3 = Right).

0 #20 ID to send in WM_NOTIFY messages.

);

my $mainw = join ',', @mainw;

return << "END-OF-HHP";

[OPTIONS]

Binary TOC=Yes

Compatibility=1.1 or later

Compiled file=$prjname.chm

Contents file=$prjname.hhc

Default Window=main

Default topic=$homepage

Display compile progress=Yes

Error log file=$prjname.log

Full-text search=Yes

Index file=$prjname.hhk

Title=$title

[WINDOWS]

main=$mainw

[FILES]

END-OF-HHP

}

sub hhp_end {

return << "END-OF-HHP";

[INFOTYPES]

END-OF-HHP

}

sub hhc_head {

return << "END-OF-HHC";

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

<HTML>

<HEAD>

<meta name="GENERATOR" content="Microsoft® HTML Help Workshop 4.1">

<!-- Sitemap 1.0 -->

</HEAD>

<BODY>

<OBJECT type="text/site properties">

<param name="ImageType" value="Folder">

</OBJECT>

<UL>

END-OF-HHC

}

sub hhc_end{

return << "END-OF-HHC";

</UL>

</BODY>

</HTML>

END-OF-HHC

}

sub hhk_head {

return << "END-OF-HHK";

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

<HTML>

<HEAD>

<meta name="GENERATOR" content="Microsoft® HTML Help Workshop 4.1">

<!-- Sitemap 1.0 -->

</HEAD>

<BODY>

<UL>

END-OF-HHK

}

sub hhk_end {

return << "END-OF-HHK";

</UL>

</BODY>

</HTML>

END-OF-HHK

}

__END__

:end

pause