How To Optimize Web Page For Search Engines?
Most web pages are found by people searching using a search engine like Google or Yahoo. For most website search engines are the major source of traffic. Web page optimization is about making web pages so that they are more likely to be found and highly ranked by the search engines.
Do web pages need Optimization
If you want to be ranked highly at the search engines then the short answer is yes. However what if you are using other promotional methods and aren't really worried about ranking highly at the search engines?
Well in this case many would say forget the optimization. However; the reality is that all web pages need some basic optimization. By not optimizing you are wasting potentially valuable traffic but not only that what would you say if one of your clients phoned and said -
"Hi Jim I was looking for your website on the Internet to check on of your products but couldn't find it! - You do have one ? don't you?"
Surely you want to be found when someone types your company name into a search engine!
Local Search and Optimization
Most people see the Internet as global and hence not really relevant to a small local business. That used to be the case but that situation is changing rapidly with local search.
The major search engines (2005) are all introducing a local search capability. It is only a matter of time before local search replaces the yellow pages for locating local businesses and services.
You may be plumber in "Never never land" and be of no interest to someone looking for a plumber in Texas but surely you want to be found when someone types:- plumber "never never land" into a search engine.
So surely a web page that starts
Jones the plumber serving the needs of the local community covering never never land and surrounding areas including under milk wood etc.
is better than one that says:
Jones the plumber serving the needs of the local community since 1955.
Basic Optimization
Whether you are making you own website or employing a web designer/company you need to be aware of web page optimization and make sure that at the very least you do the basics.
Description Tag
The page description should accurately describe your Web page and is usually displayed by the search engines along with the title. The description should be less than 200 characters and ideally less than 140 characters.
It should contain one or two of you keyword/keyword phrases with the most important one appearing near the beginning. Because it is usually displayed by the search engines it is important that it invites the reader to click.
When searchers view the search results they will quickly scan the description of the top results to check that they contain what they are looking for.
Use:
<META NAME = "Description" CONTENT ="content">
Example:
<META NAME = "Description" CONTENT = "Bungxxee jumping in Texas! Yes you can! Here Bungxxee jumping is done from a specially built tower..." >
Keywords Tag
This tag was previously used by search engines to determine page relevancy it is no longer used by the major engines (except Yahoo) . The tag is a list of keywords and phrases. Each Keyword or keyword phrase is separated by a comma.
Try to limit the length to about 10 words, less is usually better. Don't be tempted to enter 100's of keywords just because you are allowed. Don't repeat your keyword more than 3 times as some search engines and indexes will penalise you for spamming.
I tend to use a 1-4 keywords at most.
Use:
<META NAME = "Keywords" CONTENT="keyword1, keyword2">
Example:
<META NAME = "Keywords" CONTENT= "Texas bungxxee jumping, bungxxee jumping texas" >
Meta Tag Summary
Here is the meta tag block.
<TITLE> Bungxxee Jumpping Texxas</TITLE>
<META NAME = "Description" CONTENT = "Bungxxee jumping in Texas! Yes you can! Here Bungxxee jumping is done from a specially built tower..." >
<META NAME = "Keywords" CONTENT= "Texas bungxxee jumping, bungxxee jumping texas" >
Important note: I have inserted extra characters in the keywords used as examples in this article to prevent the search engines ranking this page for the example keywords.
Heading Tags
Text within the heading tags H1-H6 are given extra weighing by search engines. Therefore it is important to use this tags and not simply use a larger font size. The headline tags function just like headlines in newspapers and they draw the reader and the search engines.
The Opening Headline
The opening headline should sum up the contents of the web page. It should be designed to attract both the reader and the search engines and consist of your keywords/keyword phrases. It is often the same as your web page title or very similar to it.
The H2 tag is commonly used because most people find the H1 tag too large. I tend only to use the H1 tag for short headlines. The Headline tag should ideally be less than 40 characters (64 max). Short punchy headlines are generally better.
The opening headline tag is followed by some body text which should contain your keywords/keyword phrases. Your keyword should be in the opening sentence and once more in the first 500 characters (roughly 6 lines) or opening paragraph.
Other headline Tags H3-H6
These tags are used for emphasis and again should contain your keywords or some variation of it. These are ideally used for secondary keywords. Again the body text following the headline should contain your keyword at least once.
The general structure is:
headline H1or H2 -- contains keyword/keyword phrase
Body text ---- contains keyword/keyword phrase at least max twice
headline H3-H6 -- contains keyword/keyword phrase
Body text ----contains keyword/keyword phrase at least max twice
headline H3-H6 -- contains keyword/keyword phrase
Body text ----contains keyword/keyword phrase at least max twice
Keyword Prominence and Placement
Just like people search engines place more emphasis on the opening words of a sentence than those in the middle. So where possible try to open the sentence with your keywords.
When you few your page as a whole the keywords should be mainly concentrated at the beginning and at the end with a few in the middle.
Keyword density
Keyword density relates to how often you repeat your keywords throughout the page. It is generally regarded that 5 to 10 keywords per 100 words is acceptable. I try to go for the 5 with a maximum of 15 repeats in a page.
If you repeat it too often then the search engines might think you are spamming and your page is downgraded or even banned. So start low and you can always tweak it upwards later.
Don't get too caught up in keyword density and analysis. Generally the positioning is more important than the frequency.
Search Engines and Visitors
People have check books!- Search engines don't! Write first to please the visitor and not the search engines. If you are in any doubt as regards to the placement or frequency of your keywords then the test is simple: Does it make sense for my reader ? If it doesn't then change it!
Important note: I have inserted extra characters in the keywords used as examples in this article to prevent the search engines ranking this page for the example keywords.
How to Set Up a robots.txt to Control Search Engine Spiders?
When I first started writing my first website, I did not really think that I would ever have any reason why I would want to create a robots.txt file. After all, did I not want search engine robots to spider and thus index every document in my site? Yet today, all my sites, including Affiliated-Business.com, have a robots.txt file in their root directory. This article explains why you might also want to include a Robots.txt file on your sites, how you can do so, and notes some common mistakes made by new webmasters with regards the ROBOTS.TXT file.
For those new to the robots.txt file, it is merely a text file implementing what is known as the Standard for Robot Exclusion. The file is placed in the main directory of a website that advises spiders and other robots which directories or files they should not access. The file is purely advisory - not all spiders bother to read it let alone heed it. However, most, if not all, the spiders sent by the major search engines to index your site will read it and take cognizance of the rules contained within the file.
Why is a Robots.txt File Important?
1. It Can Avoid Wastage of Server Resources
At the date of this writing, as far as I know, many of the search engine spiders do not bother to index the scripts on your site (such as your CGI or PHP scripts). However, there are those that do, including one of the major players, Google.
For robots or spiders that actually index scripts, they will actually call your scripts just as a browser would, complete with all the special characters. If your site is like mine, where the scripts are solely meant for the use of humans and serve no practical use for a search engine (why should a search engine need to invoke my site-navigation script? - it can just crawl the direct links), you may want to block spiders from the directories that contain your scripts. For example, I block spiders from my CGI-BIN directory. Hopefully, this will reduce the load on the web server that occurs when scripts are executed by removing unnecessary executions.
Of course there are the occasional ill-behaved robots that hit your server at high speed. Such spiders can actually bring down your server or at the very least slow it down for the real users who are trying to access it. If you know of any such spiders, you might want to exclude them too. You can do this with a robots.txt file. Unfortunately though, ill-behaved spiders often ignore robots.txt files as well.
2. It Can Save Your Bandwidth
If you look at your website's web logs, you will undoubtedly find many requests for the robots.txt file by various search engine spiders. If, like me, you have a customized 404 document (which loads each time a visitor tries to retrieve a page that does not exist on your site), you will find that the robot will wind up requesting for that document instead, if you don't have an existing robots.txt file. My site has a fairly large 404 document, with the result that the spiders wind up loading it repeatedly throughout the day, adding to my already large bandwidth problems. In such a case, having a small robots.txt file may save you some bandwidth (yeah, I know, it's not that much).
Some spiders may also request for files which you feel they should not. For example, one search engine requests for graphic files (".gif" files") on my sites. Since I see little reason why I should let it index the graphics on my site, waste my bandwidth, and possibly infringe my copyright, I ban it (and in fact all spiders) from my graphic files directory in my robots.txt file.
3. It Removes Clutter from your Web Statistics
I don't know about you, but one of the things I check from my web statistics is the list of URLs that visitors tried to access, but met with a 404 File Not Found Error. Often this tells me if I made a spelling error in one of the internal links on one of my sites (yes, I know - I should have checked all links in the first place, but mistakes do happen).
If you don't have a robots.txt file, you can be sure that /robots.txt is going to feature in your web statistics 404 report, adding clutter and perhaps unnecessarily distracting your attention from the real bad URLs that need your attention.
4. Refusing a Robot for Copyright Reasons
Sometimes you don't want a particular spider to index your site because you feel that it that particular search engine infringes on your copyright or some other reason. For example, Picsearch (found at http://www.picsearch.com/ ) will download your images and create a thumbnail version of it for people to search. That thumbnail image will be saved in their web server. If, as a webmaster, you do not want this done, you can actually exclude their spider from indexing your site with a robots.txt directive (the spider apparently obeys the rules in that file).
How to Set Up a Robots.txt File
Writing a robots.txt file could not be easier. It's just an ASCII text file that you place at the root of your domain. For example, if your domain is www.yourdomain.com, you will place the file at www.yourdomain.com/robots.txt.
The file basically lists the names of spiders on one line, followed by the list of directories or files it is not allowed to access on subsequent lines, with each directory or file on a separate line. It is possible to use the wildcard character "*" instead of naming specific spiders. When you do so, all spiders are assumed to be named. Note that the robots.txt file is a robots exclusion file (with emphasis on the "exclusion") - there is no way to tell spiders to include any file or directory.
Take the following robots.txt file for example:
User-agent: *
Disallow: /cgi-bin/ |
The above two lines, when inserted into a robots.txt file, inform all robots (since the wildcard asterisk "*" character was used) that they are not allowed to access anything in the cgi-bin directory and its descendents. That is, they are not allowed to access cgi-bin/whatever.cgi or even a file or script in a subdirectory of cgi-bin, such as /cgi-bin/anything/whichever.cgi.
If you have a particular robot in mind, such as the picsearch robot, you may have lines like the following:
User-agent: psbot
Disallow: / |
This means that the picsearch robot, "psbot", should not try to access any file in the root directory "/" and all its subdirectories. This effectively means that psbot is banned from the entire of your website.
You can have multiple Disallow lines for each user agent (ie, for each spider). Here is an example of a longer robots.txt file:
User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
User-agent: psbot
Disallow: / |
The first block of text disallows all spiders from the images directory and the cgi-bin directory. The second block of code disallows the psbot spider from every directory.
It is possible to exclude a spider from indexing a particular file. For example, if you don't want Google's image search robot to index a particular picture, say, mymugshot.jpg, you can add the following:
User-agent: Googlebot-Image
Disallow: /images/mymugshot.jpg |
Remember to add the trailing slash ("/") if you are indicating a directory. If you simply add
User-agent: *
Disallow: /privatedata |
the robots will be disallowed from accessing privatedata.html as well as privatedataandstuff.html as well as the directory tree beginning from /privatedata/ (and so on). In other words, there is an implied wildcard character following whatever you list in the Disallow line.
Where Do You Get the Name of the Robots?
If you have a particular spider in mind which you want to block, you have to find out its name. To do this, the best way is to check out the website of the search engine. Respectable engines will usually have a page somewhere that gives you details on how you can prevent their spiders from accessing certain files or directories.
Common Mistakes in Robots.txt
1. It's Not Guaranteed to Work
As mentioned earlier, although the robots.txt format is listed in a document called "A Standard for Robots Exclusion", not all spiders and robots actually bother to heed it. Listing something in your robots.txt is no guarantee that it will be excluded. If you really need to protect something, you should use a .htaccess file (if you are running your site on an Apache server).
2. Don't List Your Secret Directories
Anyone can access your robots file, not just robots. For example, typing http://www.google.com/robots.txt will get you Google's own robots.txt file. I notice that some new webmasters seem to think that they can list their secret directories in their robots.txt file to prevent that directory from being accessed. Far from it. Listing a directory in a robots.txt file often attracts attention to the directory! In fact, some spiders (like certain spammers' email harvesting robots) make it a point to check the robots.txt for excluded directories to spider.
3. Only One Directory/File per Disallow line
Don't try to be smart and put multiple directories on your Disallow line. This will probably not work the way you think, since the Robots Exclusion Standard only provides for one directory per Disallow statement.
It's Worth It
Even if you want all your directories to be accessed by spiders, a simple robots file with the following may be useful:
With no file or directory listed in the Disallow line, you're implying that every directory on your site may be accessed. At the very least, this file will save you a few bytes of bandwidth each time a spider visits your site (or more if your 404 file is large); and it will also remove Robots.txt from your web statistics bad referral links report.

|