How to Migrate your Website (3 of 5)
Welcome to part three of our guide about how to migrate your website! This guide is a five part series describing some high level techniques that are common to website and server migrations. Don’t forget to check out part two of this guide for information website log monitoring using Linux.
What’s Going On?
This is part three of a five part guide concerning website and server migration. In this part, we will be discussing the creation of rewrite rules and tracking changed URLs between the old site and the new site.
Setting up Rewrite Rules to Capture lost and changed urls
Migrating from one CMS to the other has become easier as many of these applications are starting to follow the same idea with friendly urls and database driven architecture. But if you are migrating from an older CMS which relied on URL parameters in the query string or differences in the friendly url parameters between CMSs you will need to make sure those urls are being routed to the correct new pages. This same idea applies if your new website site has a different menu structure or filenames have changed. I will go over how to make changes in the rewrite rules for query string matching and full filename matching.
Finding urls that need to be updated
There are a couple ways to go about this, and using the combination of the two is probably a good plan regardless. If your site has been on the web for more then a week you probably have existing information is web search engine caches, or links from other websites. These links that lead to 404 pages will hurt your page rank and other factors of SEO, customer loyalty and retention.
We have directions for the two main Search Engines, Google and Bing.
Using Your Google Webmaster Account to find Links
Login to your Google Webmaster Tools Account.
If you do not have a Google Webmasters Tools Account, sign up for one now or skip to the next option.

Select Your site from the Site listings

Click “Your Site on the Web” Then Click “Internal Links”. This screen shows you a list of links that are used internally in your website and also are found in the Google Search Engine.

Using the Google Site Search to Find Links
Go to Google and using the Advanced Search we are going to search for all links related to your website.
site:atws.ca

Using Your Bing Webmaster Account to find Links
Login to your Bing Webmaster Tools Account. If you do not have one sign up for one now or skip to the next option.

Select Your site from the Site listings

Click on the “Index” tab Then Click “Index Explorer” This screen shows you a list of links that are used internally in your website and also are found in the Bing Search Engine.

Using the Bing Site Search to Find Links
Go to http://www.bing.com. Using the advanced search keywords, we are going to search for all links related to your website.
site:atws.ca -index

Using what we have found to Create Rewrite Rules
Due to caching on the internet, search engine caching, transparent proxies, proxy caching servers and a host of other related technologies, there may be servers containing old information and we need to make sure that we are redirecting people to the proper new pages on our website even if they go to the old URL. There are far more reasons why this is important (SEO, etc) but we are not going to get in to that on this posting.
There are two places you can put Rewrite Rules in Apache:
- .htaccess files
- you would usually use these if you do not have access to the virtual hosts files.
- typically resides in the website root directory
- any one with access to the ftp of website directories can change this file (with the correct permissions of course)
- virtual hosts files
- main file(s) for all of your apache virtual hosts entries.
- more control on who can make changes and less likely to get overridden, moved or deleted.
- typically resides in the /etc/apache2/sites-available directory
We are going to work with our rules in the virtual host entries for the purpose of this guide.
Creating the Rewrite Rules
First we need to make sure the Rewrite Engine is Loaded in Apache
sudo a2enmod rewrite sudo apache2ctl graceful
This will load the rewrite rule engine in to Apache, then we need to reload Apache to load the module in the configuration.
Determine what type of rewrite rule to use
- Did you old website use flat files, or have a friendly url rewrite rule that makes it appear as a flat file.
- Did your website use a friendly urls rewrite rule that is based on directories?
- Did you old website use URL query string parameters?
The good thing is that Apache will only see what is sent to it, so we do not have to guess what could be happening in the background. Also, keep in mind that these rewrite rules could be used in conjunction, or you may have to come up with your own combination to work with your previous setup.
Rewrite Rules for Flat Files or Rewrite Rules that Appear as Flat Files
This rewrite rule is probably one of the easiest to start and to work with because Apache sees only what is being passed so we can match against the file name completely to get exact matches.
Scenario
In this setup, we have these URLs to match against:
- http://www.atws.ca/index.htm
- http://www.atws.ca/index.html
- http://www.atws.ca/Index.htm
- http://www.atws.ca/Index.html
- http://www.atws.ca/home.htm
- http://www.atws.ca/home.html
- http://www.atws.ca/Home.htm
- http://www.atws.ca/Home.html
-
- RewriteRule
- Start the Rule
- ^/(index|home).(htm|html)$
- Regular Expression Matching
- Start of String (^)
- Start of Base URL (/)
- Match either “index” or “home” filenames
- () group
- match “index” or (|) “home” (index|home)
- (.) the period in the filename needs to commented out as the period is a special character.
- Match 1 of either “htm” or “html” file extensions (htm|html)
- () group
- match “htm” or (|) “html” file extensions
- End of String ($)
- Regular Expression Matching
- /?
- Destination
- Rewrite path (/)
- Disregard all additional Query string parameters (?)
- Destination
- [NC,R=301, L]
- Flags
- No Case (NC). Not case specific matching, deals with “index” vs “Index”
- Permanent Redirect (R=301). Send a 301 Header Response
- Last Rule (L). Stop after this rule is applied.
- Flags
- http://www.atws.ca/old-services
- http://www.atws.ca/old-services/
- http://www.atws.ca/new-services
- http://www.atws.ca/new-services/
- RewriteRule
- Start the Rule
- ^/(old-services|new-services)(/)?$
- Regular Expression Matching
- Start of String (^)
- Start of Base URL (/)
- Match either “old-services” or “new-services” paths
- () group
- match either “old-services” or (|) “new-services”
- Match 1 or 0 of “/” (/)?
- () group
- ? zero or one of anything in the group
- ” / ” forward slash character
- End of String ($)
- /services?
- Destination
- Rewrite path (/services)
- Disregard all additional Query string parameters (?)
- Destination
- [NC,R=301, L]
- Flags
- No Case (NC) Not case specific matching, deals with “old-services” vs “Old-Services”
- Permanent Redirect (R=301) Send a 301 Header Response
- Last Rule (L) Stop after this rule is applied.
- Flags
- http://www.atws.ca/index.php?p=services
- http://www.atws.ca/?p=services
- http://www.atws.ca/index.php?p=old-services
- http://www.atws.ca/?p=old-services
- The Rewrite Condition needs to be met before the RewriteRule is even considered.
- We can still use all of the same Regular Expression functions here.
- Beware, the order of the Rewrite Condition is opposite of the RewriteRule
- Condition - Environmental Variable then Regular Expression Matching (Additional Environmental Variables)
- Rule - Regular Expression Matching then the Destination
- RewriteCond
- Tell apache to match the Rewrite Condition
- %{QUERY_STRING}
- Access the QUERY_STRING Environmental Variable.
- ^p=services$
- Exact Match to “p=services” we are not using any other matching Regular Expression here.
- Start of String (^)
- Match “p=”
- Match either “services” or “old-services”
- () group
- Match either “services” or (|) “old-services”
- End of String ($)
- Exact Match to “p=services” we are not using any other matching Regular Expression here.
- RewriteRule
- Start the Rule
- ^/(index.php)?$
- Regular Expression Matching
- Start of String (^)
- Start of Base URL (/)
- Match 1 or 0 of index.php (index.php)?
- () group
- ? zero or one of anything in the group
- index.php the period in the filename needs to commented out as the period is a special character.
- End of String ($)
- Regular Expression Matching
- /services?
- Destination
- Rewrite path (/services)
- Disregard all additional Query string parameters (?)
- Destination
- [R=301, L]
- Flags
- Permanent Redirect (R=301) Send a 301 Header Response
- Last Rule (L) Stop after this rule is applied.
- Flags
- RewriteRule
We want it to go to: http://www.atws.ca/
Creating the Rule
Turn On Rewrite Engine if it has not already been started in this Virtual Host Entry.
This needs to always be above all of the rules.
RewriteEngine On
Match against the base URL to make sure we are getting Exactly the URL we want to rewrite.
RewriteRule ^/(index|home)\.(htm|html)$ /? [NC,R=301,L]
The Results:
RewriteEngine On RewriteRule ^/(index|home)\.(htm|html)?$ /? [NC,R=301,L]
Using the () grouping with the pipe (|) allows us to match against a bunch of different values that could be going to the same destinations. This way, you don’t need to make up a separate Rewrite rule for each filename. Keep in mind that this would make the resulting Rule hard to read and understand.
Rewrite Rules for Friendly URLs based on Directories
Matching against directories is very similar to the file based matching but without the filename. This is typically a result of an existing rewrite rule creating friendly URLs. This is very similar to the file matching.
Scenario
In this setup, we have the URLs:
With our new CMS, we want it to go to: http://www.atws.ca/services
Creating the Rule
Turn On Rewrite Engine if it has not already been started in this Virtual Host Entry.
This needs to always be above all of the rules.
RewriteEngine On
Match against the base url to make sure we are getting Exactly the URL we want to rewrite.
RewriteRule ^/(old-services|new-services)(/)?$ /services? [NC,R=301,L]
The Results
RewriteEngine On RewriteRule ^/(old-services|new-services)(/)?$ /services? [NC,R=301,L]
Rewrite Rules for URLs with Query String Parameters
This rewrite rules are typically used in conjunction with the flat file matching but contain some additional conditions. With URL query string parameters, you need to request them from the Apache environmental variables as they are not part of the base URL that is available to the rewrite rules.
Scenario
On our old setup, we have the URLs:
With our new CMS, we want it to go to: http://www.atws.ca/services
Creating the Rule
Turn On Rewrite Engine if it has not already been started in this Virtual Host Entry.
This needs to always be above all of the rules.
RewriteEngine On
Create the QUERY_STRING Condition for the Rule.
RewriteCond %{QUERY_STRING} ^p=(services|old-services)$Match against the base URL to make sure we are getting exactly the URL we want to rewrite.
RewriteRule ^/(index\.php)?$ /services? [R=301,L]
The Results
RewriteEngine On RewriteCond %{QUERY_STRING} ^p=(services|old-services)$ RewriteRule ^/(index\.php)?$ /services? [R=301,L]Other Rewrite Rule Resources
Test, Test, Test…
Fire up your web browser and test out all of the rewrite rules you just created. Watch the logs for errors; however, most of them will be very apparent on the web browser—it is not going to where you want it to go.
Following up on the Rewrite Rules
To follow up on the progress the search engines are making on re-indexing your changes in to their indexes we can use the webmaster tools.
Using Google’s Webmaster Tools to Check for Crawl Errors
Diagnostics > Crawl errors

Using Bing’s Webmaster Tools to Check for Crawl Errors
Crawl > Crawl Details > Click on the HTTP Code you want to view.

If you find any broken URL’s go back to your rewrite rules and add or updates your rules.
-

