Search Engine Optimization (SEO) is a complex topic if you try to devour it in one day, which can only get you overwhelmed and confused all the more. But at the same time you will find it extremely easy, fun and interesting, if you try to take it in smaller chunks, each day. You can definitely master the art of SEO if you are determined enough to learn at least one technique per day. It’s really no big deal if you plan to deal with SEO in a highly disciplined manner.
This article deals with 2 important technical concepts that you must know as a webmaster or a blogger that can help your website secure better search ranking, avoid penalties from search engines and also protect it from potential threats of hacking and intrusion.
If you have 2 or more web addresses pointing to the same content, then search engines consider each of them differently and can treat them as duplicate versions, and this can have a harmful effect on the rankings of your site. That’s why you need to canonicalize your web addresses. Canonicalization is a method to tell the search engines about the web page that you prefer out of the multiple web addresses pointing to the same content or having similar content. I will explain this with an example.
Let’s imagine that your site’s home page can be accessed using 2 different web addresses, namely, http://www.yourwebsite.com and http://www.yourwebsite.com/home.htm
But for the search engines they are two separate and distinct addresses, which can get your website flagged for hosting duplicate content. So in order to avoid such a hassle you must explicitly tell the search engines which one is the canonical page or preferred version (http://www.yourwebsite.com) and which one is non-canonical (http://www.yourwebsite.com/home.htm). You can easily accomplish this task by including a “rel=canonical” attribute to the <head> section of the non-canonical page by inserting this code ( after the <head> tag).
<link rel=”canonical” href=”http://www.yourwebsite.com/home.htm”/>
After encountering this attribute in the <head> section of the non-canonical page, the search engines will know which is your preferred version of the web address and gives it higher priority over the non- canonical version of your web page and possibly index it in their search results database. Canonicalization must be undertaken on your website/blog if you want to preserve the SEO benefits of your website.
When a search engine bot lands on your website, it will look for a simple text file called Robots.txt file. A robots.txt is a simple yet very important text file that tells the search engine spiders or crawlers about the folders and files that needs to be crawled (and indexed) and the ones that it needs to skip. Search engines are generally very greedy to crawl and index every web page on your website that contains high quality information, unless you explicitly tell them not to crawl, using this simple txt file. Though it is a simple txt file, it can greatly improve your website’s search engine ranking by providing valuable information about your site’s crawling preferences. A good robots.txt file will save the time of search engine spiders from crawling unwanted directories of your website and it immensely helps in gaining SEO advantage.
Follow these easy steps in order to create a basic Robots.txt file with different crawling preferences
This code tells the search engines to crawl and index all the files on your website,
1. Open the notepad application
2. Insert the following commands in the file
3. Save the file as “robots.txt”
4. Upload this file to the root directory of your web server
Now, if you want to restrict the crawler access to certain folders on your website then you need to do some minor amendments.
Imagine that you have a folder named “windows” in your server that you don’t want the crawlers to access. Then do this simple modification to the file as follows,
And if you want to block the crawler access to all the files and folders on your website then use the following commands
Please note that there is only “Disallow” command in Robots.txt, and nothing such as an “Allow” command. Usually, all the files on your website/blog are automatically crawled and indexed if you don’t explicitly tell that to the search engine crawlers using the Robots.txt file or if the bots don’t encounter such a file in your server.
Keeping a note of these 2 SEO techniques, that can absolutely help you improve and preserve your website’s search engine rankings and can surely make your online business a tremendous success in the long run.
Image Attribution: digitalart / FreeDigitalPhotos