Seach Engine Optimized With Robot.txt

Monday, September 27, 2010

Seach Engine Optimized With Robot.txt

Sorry friends, my English is not good, still in trials. Ok!

Appeared on the first page of search engines with topics that match the keyword is the desire of every owner of the web, but it is not easy because a lot of things to do in order to optimize a web page in order to become like what is desired by the search engines. In my last article I have discussed about how to register in search engines and how to create a sitemap for website easily readable by search engines. So in artike this time I will write a little about how to limit the search engines to not index so carelessly do exist certain limitations that must be obeyed by the spider = search engine crawlers in their duties to index a web page. Rules are written in a file called "robots.txt".

Most websites or blogs have files and folders that there is no point to indexed by search engines such as image files, the admin files or files that you think is the secret. You can limit the spider not to index the file by creating a robots.txt file. To create this file is very easy you only need a notepad to write code robots.txt. Example code can be viewed like this:

User-agent: *

Disallow: /images/

Disallow: /feed/

The code meaning is not allowed to access the spider and index the contents of the directory / images / and all URLs that begin with / feed /.

You can copy and paste the code below to create a robots.txt file

User-agent: Googlebot Disallow: /log/*/trackback Disallow: /log/*/feed Disallow: /log/*/comments Disallow: /log/*?* Disallow: /log/*? Disallow: /log/search User-agent: * Disallow: /cgi-bin/ Disallow: /log/wp-admin/ Disallow: /log/wp-includes/ Disallow: /log/wp-content/plugins/ Disallow: /log/wp-content/themes/ Disallow: /log/trackback Disallow: /log/comments Disallow: /log/feed

Or you can create a robots.txt to your liking using this tool

Next put a robots.txt file in the root directory of your web.

The placement location of the Robots.txt must exist in the main directory = root directory because when search engines come on your site, then he will go directly to the main directory for example, http://myweb.com/robots.txt and if search engines do not find it in the root directory, then they will conclude that your site does not have a robots.txt file, then search engines will index all of which he discovered during crawling your site and do not be surprised if you see the entire contents of the site and your confidential files indexed and displayed on search engines .

What I wrote above is a way to limit the search engine spiders and not fully protect your files because in the Internet world there is evil spiders and spiders so well that you should not put your confidential files in your web directory. Good Spider will immediately understand that finding the restrictions did not allow himself to index but to the evil spider boundaries it means nothing and this is usually done by the cyber crime.

Hopefully, my articles useful and if one wants to add please love comments.