Note: If you don’t care about the technical details of switching over a site and its structure, you can probably skip this entry — it may confuse you, or simply bore the heck out of you.
For the old (second) version of this site, I used a lot of ASP scripting to generate dynamic content and automatically change navigation states based on directory paths and query strings. This new (third) version uses PHP and shifts around some of the directory structure. While I can change and control internal links pointing to my own content, lots of external links exist on other sites which point to files on stopdesign. I wanted to make sure as many of those links as possible still pointed to appropriate places on this site.
The largest block of files on this site which changed were part of the Log Archive. The old site used a date-based query string to dynamically pull in appropriate entries through a single master file: /log/default.asp. Each entry was saved as a fragment of code which waited to get pulled in through ASP’s Server.Execute method. A six-digit value for the date key (i.e. ?date="200306") pulled in all entries for that month. An eight-digit value (i.e. ?date="20030624") pulled in any entries for a specific day.
In this new MT-powered version, entries are archived as complete physical files in a date-based hierarchical directory structure. The same values referenced above now live in /log/2003/06/ and /log/2003/06/24/, respectively. Based on what I had seen in various places around the web, I was pretty sure Apache’s mod_rewrite module could help forward all request for all the old filename/query string combinations to the new site’s structure. Since I wasn’t familiar with mod_rewrite, it took a little bit of experimentation to get it to work. But the directives required to forward the query string to the new Log Archive index ended up being very simple. These three lines were placed into the .htaccess file inside the /log/ directory.
RewriteEngine on
RewriteBase /log
RewriteRule ^default\.asp$ index.html [NC,R]
The flags at the end of the last line (in square brackets) ensure the file name match is not case-sensitive (NC), and that it forces an external redirect (R).
Since query strings are handled separately, I could pass the untouched query string to the new index file, verify that the date value consisted either of six or eight digits only through a regular expression, then match the old date IDs to new directory structures and filenames through a simple (but long) manually-created associative array:
// assign months first
$r['200208'] = '/2002/08/';
$r['200209'] = '/2002/09/';
$r['200210'] = '/2002/10/';
[...]
// now assign days or ind. entries
$r['20020819'] = '/2002/08/19/something_new.html';
$r['20020820'] = '/2002/08/20/craving_more_style.html';
$r['20020821'] = '/2002/08/21/news_worth_noting.html';
[and on and on...]
The manual creation of the array allowed me to decide if each eight-digit date ID should redirect to a daily or individual entry archive. Once I found the matching string (via a foreach loop through the array), I needed to append the first portion of the URL (http://www.stopdesign.com/log) to the string, then redirect any inbound request with a date value in the query string to the new URL for the appropriate month, day or individual entry archive.
ASP file requests for other sections of the site were handled through simple Apache Redirect directives. Now that these redirects are working, any external links from other sites pointing to old files on stopdesign are automatically redirected to the correct place in the new structure, preventing the unfortunate 404 File Not Found error page (which I still need to customize).
Ok, if you have a programming background, I know, this is simple stuff. But I’m a designer figuring it out for the first time, and I think it’s pretty cool that it works. And that I’m the one who got it working.
Posted in Site, Technology

10 comments (Comments closed)
You may want to consider using some of PHP’s directory reading functions.
You can loop through the contents of the logs directory, ignoring all but directories within. Each directory name could be loaded into a working array and a master array.
foreach the working array, taking each value and reading that particular directory (logs/$value), and adding each value to a master array of files for redirection.
Recurse through all levels you need, until you get to the files..
Beats writing the array by hand.
Ah, recursion. That’s a concept that just can’t fit inside my head without a background in programming. I was introduced to it a couple of years ago when designing a web-based app which mirrored a server directory structure. My serveral hundred lines of conditional
if-thenstatements andfor-nextloops were rewritten by a colleague in about 5 lines of code. To this day, I still can’t grasp the logic behind recursion, even after he explained it to me several times.Anyway, about the array: The redirects only need to carry up to the point when the site switched over, so the array won’t need to grow dynamically. Now that the array is written, it shouldn’t change, unless I discover a copy/paste error, or an old entry title changes.
But good to know PHP has built-in file system functions.
Actually I’m more interested in how you got MT to output to “index.html” files without all your permanent links including the “index.html” part. I was able to do this for a site I had a while back but I had rewrite all the appropriate templates to do so. Did you figure out some easy way of doing this in the backend using a template module instead or did you have to rewrite all your templates as well?
Nollind: The
index.htmlis hacked off my permanent links with a simple inline search & replace using a regular expression:<a href="<?phpecho preg_replace("#index.html$#","","<$MTArchiveLink$>");
?>"><$MTArchiveTitle$></a>
oh, thats a nice trick you have there.
Interesting article which raises a point that I hadn’t thought about regarding rewrite rules. Is it neccessary to escape the dots in the source URL?
Since the rewrite engine basically uses regular expression matching I guess that is so, since a single dot could stand for any single character. It is just something I have never done, nor given any thought.
Thanks - testing begins!
Yes, to properly match a period/decimal character in a rewrite rule, you should escape it with the backslash, otherwise you are using the wildcard character. Fortunately, this will rarely cause a problem.
Interesting about the R flag for the rules. I’ve always use QSA (Query-string append). I guess the result is similar (I’m no Apache guru, by any means)
I’ll take a swing at explaining recursion (with a super-simple javascript example). Basically, a function that calls itself is a recursive function.
var i = 0;
var max = 6;
function increment()
{
i++;
if ( i < max ) increment();
}
increment();
alert( i );
It should be fairly easy to see that recursion is just another looping mechanism. Yes, it can get much more complicated than this, but maybe this little demo will give you jumping off point for understanding recursion. I can come back with something more complicated later if you like.
-Peter
P.S. Being able to use <pre> elements would be nice for code examples ;)
Glad to see you got everything sorted out Doug. Speaking of recursion, this quote is priceless:
We do not know any sure way to explain recursion. Our experience is that people stare at recursive programs for a long time without understanding how they work. Then, one day, they suddenly get it — and they don’t understand why they ever thought it was difficult. Evidently, the key to understanding recursion is to begin by understanding recursion. The rest is easy.
- Accelerated C++, p. 134
Haha! That’s a great line, David - and so true!
Doug - to answer you recursion question from the email you sent me - the condition that breaks the recurring is one that the programmer needs to set up. There is no automagic process for breaking a recursive algorithm. In my example above, it’s the condition
if ( i < max )
If you can’t check something (a value, property, etc) to make sure the recursion never ends, then you shouldn’t use recursion - but a more statically defined loop.
Make a teeny bit more sense now?
Recursion: Keep calling yourself until you meet some criterion, then break out of it all. Useful, but underused.
Anyways, great job switching it over to PHP, it seems flawless. Keep up the great writing. It’s inspired some of us in the Freenet Project to start writing about design and getting people to standardize their stuff.
Comments no longer open for this entry.