A sitemap is like a roadmap πΊοΈ for search engines, guiding them through the different pages in your website and helping search engines like Google to properly index the pages. In this post, we'll dive into what sitemaps are, why they're crucial for your site, and how to create one, taking Hubhandle.com as an example. Let's embark on this SEO journey together π
Think of it this way: π§ If I give you a website URL like hubhandle.com, how would you know what all pages are available on the site? Well, you would probably click on links in headers and navigation buttons, right? This will lead you to another page, which might also contain more links. You will then click on those links that will take you to other pages. This is exactly how search engine crawlers work. They have information about your website domain, and these bots crawl your page.π€ They look for any links and follow those until they end up in a dead end with no links, or come back to the same page they have seen before. They will do this for all links on all pages and create a mapping of your site. π΅οΈββοΈπ
What if your website pages aren't linked properly, but you still want your "hidden pages" to appear in the search results? Well, this is quite common in large and complex sites. Not all pages are linked. But even in a simple site like hubhandle.com, this "hidden pages" problem can occur. When we launched hubhandle.com, we only had a landing page, and we designed it that way. There was only a landing page, and no links to other pages. Recently, we launched our blogging platform, hubhandle.com/blog. π This site was in a different path (/blog), and there was no way to reach this page from the landing page. We had to manually put the blogs in the Google Search Console π οΈ and tell it to index those pages.
Sitemaps act as a guidebook π for search engines, especially for those elusive "hidden pages.". Even if you think that the pages in your sites are properly linked, and there are no "hidden pages", you should still create a sitemap for your site. Creating a sitemap won't hurt. It provides additional information to the search engines and is pretty easy to create. π―
A sitemap is a file that contains all the enpoints in your website that the search engine need to crawl. It also provides additional information like how often the contents of the site changes, when was the site updates, etc.
Sitemap can be written in multiple formats. In this post, I will be talking about XML sitemaps, and that is most probably the only format you will need too. A XML file format contains an inverted tree like structure. The branches of a tree comes out of a stem and more branches comes out of branches. Similarly, the "stem" of the xml is the origin, which will contain more "branch" that describes additional details.
<stem>
<branch1>
<property1>Value 1</property1>
<property2>Value 2</property2>
</branch1>
<branch2>
<property1>Value 1</property1>
<property2>Value 2</property2>
</branch2>
</stem>
This is an example of a XML file. As you can see, the stem contains two branches, each of which contains two properties and a value for each property. The things inside opening and closing braces < ... >
are called tags. For eg: <stem>
is a tag named "stem". All the tags must have a closing tag like </stem>
.
Let's put theory into practice. We'll craft a sitemap for Hubhandle.com. Before creating a sitemap, you should know all the endpoints of your site. Let's create a sitemap for hubhandle.com. At the time of writing, hubhandle.com contains these endponts
Let's see a sitemap for the first link:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://hubhandle.com</loc>
<lastmod>2024-02-12</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
</urlset>
The first line is a metadata, that describes xml version and encoding used in the sitemap. The endpoint of your site must be contained inside <urlset>...;</urlset>
tag. Then your have to put the sitemap data for each of the endpoints inside <url>...;</url>
tag. loc
tag contains the location to the endpoint. lastmod
describes when was the endpoint last modified. changefreq
describes how often the site changes. The values can be "daily", "weekly", "monthly", "yearly", "never", "always", "hourly". priority
is a value between 0 to 1 that indicates how important this endpoint is compared to other endpoint in the sitemap. Value of "1" means is has the highest importance. Have a look at hubhandle.com/sitemap.xml for reference. Refer to sitemaps.org for more details.
Now that you have written the sitemap for your site, it's time to validate the sitemap. There are different tools to validate the XML. Visit http://www.w3.org/XML/Schema#Tools or http://www.xml.com/pub/a/2000/12/13/schematools.html for the list of tools.
I am using xmlschema-validate on linux. Download the XML schema from http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd. This schema contains instructions for the validation tool regarding how to validate the xml. Now, install the tool and run the validation.
$ sudo apt install python3-xmlschema
$ xmlschema-validate -v --schema sitemap.xsd sitemap.xml
sitemap.xml is valid
Hooray!! π Our sitemap is valid! β
Now that you have created a sitemap for your site, you need to tell the search engine where is the sitemap file located. In most cases, sitemap files are present at domain.tld/sitemap.xml. "robots.txt" is another file that the search engines uses to check what all pages to crawl. We will create a separate blog on robots.txt, follow hubhandle on
Linkedin
X
Instagram
or subscribe to the blog by filling the form below to get notified.
You can add a "Sitemap" entry in robots the txt. For instance, we have added this entry in our page's robots.txt.
Sitemap: https://hubhandle.com/sitemap.xml
πBy creating a sitemap and updating your robots.txt, you've paved the way for search engine crawlers to explore every nook and cranny of your website. Sit back, relax, and watch as your pages soar to the top of search results! Happy indexing! ππ