Entries in Leoville (27)

Friday
Dec311999

Perls of Wisdom (not)

I use to program for fun, but I don't have time (or enough concentration) to write anything substantial any more. It's still fun to hack out one-liners from time to time though. The other day I decided to keep track of my Amazon.com sales rank on the front page of Leoville. To do this I'd have to write a program in perl that the web server could call using CGI, the common gateway interface. The program would return the rank which the web server would embed into my page. The first iteration of the program was pretty simple, thanks to a perl library called LWP. The library provides built-in routines to access web pages. Using the LWP routine "get" I can fetch the contents of the Amazon.com page, then use Perl's built-in text search features to extract the ranking. I wrote the program in a few minutes: use LWP::Simple; my $webpagetext; # access Amazon web page $webpagetext=get("http://www.amazon.com/exec/obidos/ASIN/0789726912/qid=1007181368/sr=1-6/ref=sr_1_74_6/104-8979567-7976756"); # find sales rank $webpagetext =~ /(Sales Rank: )(d+)/; # output sales rank print "Content-type: text/htmlnn"; # this text is required for CGI output print $2; If you're not familiar with perl a few things might need explanation. All the real work is done in the line... $webpagetext =~ /(Sales Rank: )(d+)/; In English this would read something like: search the contents of $webpagetext for the text "Sales Rank: " followed by one or more digits. The parentheses in the phrase (Sales Rank: ) (d+) tell Perl to group the results. Perl assigns the value in the first group to the variable $1, the second group to $2, etc. I'll use $2 later to output the rank. Finally I print the results to the console. CGI routes the output back to the web server which inserts it into the web page that called it. I use Apache's server-side includes (SSI) to call the perl program and embed the results of the program. On my system that means putting the line: &lt!--#include virtual="/cgi-bin/ranking.pl"--&gt into the web page. When the web server sees it, it calls ranking.pl and sticks the result into the page at that point. So far, so good. I could run the program locally and it worked fine, but it wouldn't work on my server. Turns out the LWP module was never installed. I wasn't sure how to get around that until I installed Movable Type. This blog software uses several modules that aren't part of my web host's perl installation. But I learned I could put the needed modules in a directory on the web host and tell your program to look for them there. Thus adding the line: use lib "/cgi-bin/mt/extlib/"; at the beginning of the program and storing the LWP::Simple module in the extlib directory, fixed the problem, and version 1.0 of my program was up and running. Worked great, too, until my book fell below 999 in ranking. Amazon displays larger ranks with commas, and my program didn't consider that. I changed the search to include commas by substituting the regular expression [0-9,] for d: $webpagetext =~ /(Sales Rank: )([0-9,]+)/; and it was working again. Incidentally, I work in perl on both Windows and Macintosh. On Windows, I use an excellent shareware editor from DZSoft. On Mac OS X I use BBEdit from Barebones Software. Both really speed up the development cycle by letting you run the program from within the editor, with built-in FTP uploading, and a perl reference. No program is ever done, and neither was this one. Next time, how I extended it to keep track of the peak scores. (And maybe one of you perl experts can help me with a bug that's really been buggin me.)
Friday
Dec311999

Referring to Referers

Leoville went down earlier today for a few hours. It was out yesterday for about an hour, too. I contacted my excellent web host, Nacio, and they brought it back. Here's what they said...
We have noticed that 2 IP addresses are consistently opening and not closing connections to your website: 12.xx.xx.xxx and 172.xx.xx.xxx. On average, each IP will have 60 open TCP sessions on a 24/7 basis. Are you familiar with these IPs? If not, we may look in to blocking them, as their irregular activity may be part of the problem. Also, we have been monitoring the disk usage on your site. Currently you are using 1.5GB total--roughly 1.4GB being your discussion board. As you are on a shared webserver--geared toward smaller 40 - 80MB sites--we were hoping that you could remove some content from your site. Would 500MB be sufficient, or do you need more?
I asked them to block the two IP addresses and that's seemed to help. I'm going to have to cut back on the disk usage, too. Obviously the boards have gotten way out of control. I'm pruning messages older than 90 days and I'll probably cut the max file size to 50kb. Sorry to have to do that, but I really would like to keep this site running! On another note, I've been playing with Dave Winer's new Radio blogging software and I have to say it's wonderful. I'm sticking with Movable Type, but for anyone who wants to create their own web site without having to struggle with the tech this is it. It finally fulfills the web's promise to be the people's publishing platform. You can read the temporary blog I set up to play with it at weblogs.com. It literally took me 10 minutes to get it up and running. And it's free for 30 days - $40 for a year including the web hosting. If you've been thinking about blogging try Radio. While playing with Radio, I noticed that the link to referrers is misspelled on the admin page. No blame to Dave Winer for this. The misspelling dates back to the original HTTP spec which also misspells referrers as "referers." This ancient error caused me endless confusion when I was writing my own Perl referers routine (it's running on Leoville now. To see the most recent 20 referring pages click here). The program failed at first because I kept spelling referrers correctly. It took me a while to figure out where I was going wrong. But the misspelling poses an interesting problem. Do you perpetuate it, as Dave has done, in public, or do you continue to spell it correctly while using the non-traditional spelling inside your programs? I chose the latter route on the front page of Leoville, but I might be in the minority. In fact, this is exactly how a language evolves. I suppose, in time, "referers" will become the correct spelling, all thanks to a small spelling error at the W3C. Even though programmers are notoriously bad spellers, I can't think of another instance where a misspelling has become enshrined in a spec. Can you? And you can bet I spell checked this post before submitting it.
Friday
Dec311999

Darn all spammers to heck!

Several of you have emailed me about a spam message you've received with the subject line: Fw: http://www.leoville.com/mt/archives/ I have also received that spam message. The company sending it out, www.trafficbbs.com, is spamming - they have nothing to do with me. Apparently they're harvesting addresses from blog comments. I will attempt to get them to stop but I don't have high hopes. The company slogan is:
Offer you great data of 50,000+ search engines & 120,000+ BBS! Present you to a magic world of instant & effective online communication!
And there's no phone number on the web page. Just a fax number. These things happen all the time. The best defense is to not use your email address anywhere on the web. As long as a page is publicly accessible, a spammer can harvest the addresses. Since most of the time they use automated programs to do the harvesting, it's possible to use a human readable address that confounds the robots. Something like: leo at (die spammers die) leoville.com I'm sorry that that's necessary, but that's the way of the web, alas.
Friday
Dec311999

Growing Pains

Success brings its own problems. When I started this web site in '96 or '97 there wasn't much to it, just a few files and sound effects for download, maybe an occasional editorial (see for yourself). In the last year I've added this blog and a message board. Both features have proved much more popular than I ever expected, taxing my puny little server, and the programs I chose to use, beyond their capabilities. To say nothing of my capabilities. I upgraded the blog to Movable Type and that's been a great improvement. Now the message boards and the server itself seem to be spinning down the great disposal of life. Mike Chandler of Annex.com, a regular around here, has offered me server space on a fresh new Dell box with such nice bells and whistles as PHP and MySQL. I think I'll probably take him up on it. And I'm looking at alternative message board solutions. With access to MySQL I can try some databased programs that should be more responsive and better able to handle the heavy traffic that the Town Square has developed. One of the possibilities is InfoPop's UBBThreads. It's being used by some pretty big sites (including Ms. Magazine and Playboy - strange bedfellows). Ars Technica also uses it. If you're a heavy user of the Leoville message boards I'd appreciate your feedback on UBB, and I'd like to hear about any other programs you think I should consider. Moving the server and adopting new software is a major undertaking, and I only want to do this once, so any advice or input you can give me before I begin is much appreciated. Just add them to the comments here on the blog, if you would. I promise to make this as seamless as possible for you all!
Friday
Dec311999

Selling Out

I've decided to try setting up a little Leoville store with a few items people have asked me for. I've got an autographed mug in there, boxer shorts, sweatshirts, and so on. I can't put copyrighted TechTV images on it, so there's no Screen Savers mug - even though that would be a bestseller - but there's some other fun stuff. The store is powered by Café Press. They do all the printing and fulfillment, for which they take the lion's share of the money, but I'll make a few bucks per item. Let me know how you like the quality of the items, and if there's any other Leoville merchandise you'd like to see in there.