Mike
Mike HalfDork
10/12/12 6:20 p.m.

So, I have this site that I want to monitor. I want to be able to have something send a bunch of emails if certain things happen to the HTML. I'd like it to check every, say, hour or so to see if the HTML has changed. If it hasn't changed, it shouldn't send an email, but logging that it checked would be good.

Any suggestions?

Grtechguy
Grtechguy UltimaDork
10/12/12 6:22 p.m.

Do you own the site?

Many of the hosts provide this feature

Brett_Murphy
Brett_Murphy SuperDork
10/12/12 6:35 p.m.

Is somebody trying to guard against Wikipedia edits?

petegossett
petegossett UltraDork
10/12/12 8:07 p.m.

Web Scraper, possibly?

peter
peter HalfDork
10/12/12 9:07 p.m.

easy.

use wget to grab the webpage as it is now, undisturbed. get a md5 or sha1 of that webpage, write it down

write a bash script that uses wget to fetch the file, then md5/sha1 it, compare that value with your "known good" value, and email you if they don't match.

put this script in your crontab to run every N seconds/minutes/hours

done

Mike
Mike HalfDork
10/14/12 7:58 a.m.

What I need is really to monitor one tag. I've thought about cron-ing a shell script to wget the page, and then grepping for the tag, but I don't have the grep-fu to check, for example, for the tag being commented out. The page is dynamic, so hashing probably isn't a good strategy.

peter
peter HalfDork
10/14/12 10:41 a.m.

Then what you want is Beautiful Soup, and some python-fu.

Toyman01
Toyman01 PowerDork
10/14/12 11:25 a.m.

So, what language is this you speak?

Nothing useful, sorry.

BoxheadTim
BoxheadTim PowerDork
10/14/12 11:44 a.m.

For "regular" monitoring I'd use something like nagios. Not sure if there is a plugin that will do what you want, though.

peter
peter HalfDork
10/14/12 12:26 p.m.
BoxheadTim wrote: For "regular" monitoring I'd use something like nagios. Not sure if there is a plugin that will do what you want, though.

I've been assuming that he is not the admin of the page that he is monitoring.

BoxheadTim
BoxheadTim PowerDork
10/14/12 12:32 p.m.

I assumed that too, but IIRC you can point nagios at pretty much any page, admin or not.

GameboyRMH
GameboyRMH PowerDork
10/14/12 1:31 p.m.

Should be easy to do with a shellscript or python script...you just need wget and grep (or equivalents) and some logic to put them together. If you PM me with details I could hack something up for you. Is this going to run from a Windows or *nix machine?

You'll need to log in to post.

Our Preferred Partners
leZBJTXjNvByE6JGEx9sQchn3YaCdLlxXLmkfe6fWORUyQndk78p4XY12ITALjOs