Hugo修改网站地图sitemap.xml优化SEO表现

sitemap.xml就是我们网站的地图，用来告诉搜索引擎网站都有哪些内容，供搜索引擎分析，进而编入索引，供用户搜索，Hugo默认生成的sitemap.xml文件中包含了大量的分类，tag，存档，搜索，关于，友链等地址链接，这些URL的标题基本都是网站标题，description也都是网站简介，正文内容部分基本为空，这样在搜索引擎看来会存在大量的重复无效页面，这并不是一个好现象，可能会让搜索引擎把我们的网站判断为垃圾站，不进行索引，甚至会导致搜索引擎惩罚，从搜索结果中除名，那样我们写得内容基本就不会被人搜索到了。

自定义Sitemap.xml

我的需求是希望sitemap中保留的只有我的主页，还有我写的正文文章，每一个链接都是有效页面，去掉重复无效的页面，经过网上搜索一些用户的经验，成功实现了需求，修改内容如下:

编辑/config/default/config.toml文件,在底部增加一个设置项,此设置项的作用是指定’tag’ ‘categories’分类不编入sitemap

1
2


[params]
taxonomiesExcludedFromSitemap = ["tags", "categories"]

在/themes/stack/layouts文件夹下新建sitemap.xml文件，输入如下内容：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44


{{ printf "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?>" | safeHTML }}
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
  xmlns:xhtml="http://www.w3.org/1999/xhtml">
  {{ range .Data.Pages }}
  {{ if and (not (in .Site.Params.taxonomiesExcludedFromSitemap .Data.Plural)) (not (strings.Contains .RelPermalink "/links")) (not (strings.Contains .RelPermalink "/page")) (not (strings.Contains .RelPermalink "/post")) (not (strings.Contains .RelPermalink "/archives")) (not (strings.Contains .RelPermalink "/search")) (not (strings.Contains .RelPermalink "/about")) }}
  <url>
    <loc>{{ .Permalink }}</loc>
    {{ if not .Lastmod.IsZero }}
    <lastmod>{{ safeHTML ( .Lastmod.Format "2006-01-02T15:04:05-07:00" ) }}</lastmod>
    {{ end }}
    {{ with .Sitemap.ChangeFreq }}
    <changefreq>{{ . }}</changefreq>
    {{ end }}

	{{- if ge .Sitemap.Priority 0.0 -}}
	{{- $weeks := div (sub now.Unix .Lastmod.Unix) 604800 -}}
	{{- $priority := sub 1 (div $weeks 10.0 ) -}}
	{{- if ge .Sitemap.Priority $priority -}}
		<priority>{{ .Sitemap.Priority }}</priority>
	{{- else -}}
		{{- if ge $priority 1.0 -}}
			<priority>1.0</priority>
		{{- else -}}	
			<priority>{{ $priority }}</priority>
		{{- end -}}
	{{- end -}}
	{{- end -}}

    {{ if .IsTranslated }}
    {{ range .Translations }}
    <xhtml:link
                rel="alternate"
                hreflang="{{ .Language.Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
    <xhtml:link
                rel="alternate"
                hreflang="{{ .Language.Lang }}"
                href="{{ .Permalink }}"
                />{{ end }}
  </url>
  {{ end }}
  {{ end }}
</urlset>

关键内容在第5行，可以看到我除了屏蔽了’tag’ ‘categories’还屏蔽了/links，/post，/archives，/search，/about，这一系列页面。

原版源码没有robots.txt这个文件，robots.txt 文件规定了搜索引擎抓取工具可以访问您网站上的哪些网址，也可以禁止爬虫访问某些网址，虽然我们在上面sitemap中屏蔽了这些无效url的生成，但是这些url确实是网站存在的，有时候爬虫还是会访问到，所以创建robots.txt文件进一步限制搜索引擎爬虫的行为。在/static文件夹下新建robots.txt文件，输入如下内容,Disallow部分可以后续继续增加新的网址，不想被索引的都可以在这里加进去

1
2
3
4
5


Sitemap: https://www.flashspace.org/sitemap.xml

User-agent: *
Disallow: /tags/
Disallow: /categories/

功成

以上三步实现了基本的一点网站结构上的SEO优化，让网站结构更加简单，方便搜索引擎爬取进行索引。