We'll teach you how to use Python to monitor a webpage on the Raspberry Pi in this project.
This Python script will monitor a website on your Raspberry Pi and tell you when it goes down or changes.
This is accomplished by saving a basic replica of the webpage locally and comparing the changes.
Because this website monitor is so basic, it should work just fine on a Raspberry Pi Zero.
We'll teach you how to write your own script to monitor a website throughout this course. You should be able to adapt the script to your individual requirements using this information.
You can skip to the part headed "Running your Raspberry Pi Website Monitor Periodically" if you don't want to learn how the code works. However, in order for email notifications to operate, you'll need to make some changes to the code.
Because it doesn't require an interface, this project is ideal for a headless Raspberry Pi.
Despite the fact that this article concentrates on the Raspberry Pi, this code will run on any device that supports Python 3. As a result, you may execute this script on your Windows device if you like.
Equipment
The equipment we needed to build up a script to monitor URLs on our Raspberry Pi is listed below.
Installing the Website Monitor on your Raspberry Pi
Before we begin, we must ensure that we have all of the necessary packages to execute our website monitoring script.
These procedures involve making sure we have Python 3 and the necessary Python packages installed.
1. First, we'll update the package list as well as any existing packages.
To make these changes, we'll use the terminal on our Raspberry Pi and run the following two commands.
2. We must guarantee that Python 3 and its package manager, "pip," are installed on our device.
To check that both of these packages are installed, run the command below.
3. The essential Python packages must be installed before we can construct our script to monitor our websites.
Install the requests, beautifulsoup4, and lxml packages using this command.
Using the Website Monitor on your Raspberry Pi
Now that we've installed all of the necessary packages, we can start writing our basic website monitor. Each aspect of this will be broken down into its own section so you can understand how everything works.
While you may create this code in the nano text editor, we prefer using a good IDE like Visual Studio Code.
Please review our Python lessons if you want to learn more before continuing.
1. On your Raspberry Pi, start building the Python script to monitor a website. For this example, we'll use the script name "websitemonitor.py."
If you want, you can start writing this script in nano with the command below.
Python code for a simple website monitor
Before we get too far into building a sophisticated website monitor script for our Raspberry Pi, let's start with the most simple solution.
We'll construct a basic script in this part that captures a webpage, compares it to the original content if one exists, and publishes a notice if there's a difference.
1. Always begin your script by importing the packages that we'll be using. We'll need three packages to get started with this script: "os," "sys," and "requests."
os — You may use this package to communicate with the operating system.
This will be used in our instance to generate a cache of our most recent website request. We'll check this cache to see if anything has changed.
sys — To receive any parameters supplied into the script, we'll utilise the sys package. In our scenario, the user will be able to provide the website URL as well as a name for the cache.
Python is able to make requests thanks to the requests library.
We may use this to take and preserve the contents of a certain website.
Creating the function has_website_changed()
2. Next, we'll develop a function to handle the majority of our reasoning.
This function will be called "has website changed" and will require the input of two arguments.
The first option (website url) specifies the website's address. This is where our get request will be made.
The website name (website name) is the second argument. This will be a condensed form of the cache filename.
This function will return one of three states: -1 if the website is "not fine," 0 if the website hasn't changed, and 1 if the website has changed.
Remember that indentation is crucial in Python. Make sure to keep the indentation when we fill out this function.
Headers for our Python Request Definition
3. Now that we've created our function, we can begin adding actual functionality.
We may begin by specifying the headers that will be used by the request module when requesting the website. We are establishing two things with these headers.
The "User-Agent" header is the first. Feel free to change this to suit your needs. This one will be rather straightforward.
Second, we set the "Cache-Control" header to "no-cache," which indicates that neither the request nor the end server should cache this one. This request will not be honoured by all web servers.
Using the Website to Make a Request
4. We may safely use the requests package to retrieve the passed-in URL with the headers we requested.
This is one of the most important lines in our Raspberry Pi website monitor script since it determines the current condition of the website.
The "requests" package's get method is named after this. Our "website url" variable and "headers" are sent in. Our "response" variable will hold the outcome of this request.
5. After retrieving our response, double-check that the website returned a "OK" result.
This simply implies that we must verify that the status code is not less than 200 and not more than 299.
If it falls outside of our acceptable range, we return "-1," indicating a server fault.
Save the Response Text and the Filename of the Cache 6. Let's make two more variables now that the answer has been reviewed to guarantee we obtain at least a decent status code.
The first is "response text," which will temporarily store the text from the response we previously retrieved.
This variable can be used to change the answer text before saving it. Still, for the time being, our Raspberry Pi website monitor will accept the answer content as is.
Second, we make a variable named "cache filename," which will hold the name of our cache file. The "website name" variable and "_cache.txt" will be combined to create this filename.
So, if our website name was "pimylifeup," the filename would be "pimylifeup cache.txt."
Cache Creation on a New Website
7. A cache file for the current website URL may not exist when you initially execute the script.
We use the "os" packages "path.exists()" method to see whether this cache file already exists.
If this file does not exist, we create it by using the "w" option to open our cache filename. The current response text is then saved to the file, ready for our Raspberry Pi to watch for changes on the website.
We return 0 since this is a fresh request, indicating that the response hasn't changed.
View a previous request's cached response
8. If the code reaches this stage, we must access the cache file and read the contents into the variable "previous response text."
We'll use "r+" for the open function this time. This instructs open to read and write to our cache file.
Python shifts the stream location after reading the file, thus we must use the "seek()" method to return it to the beginning. If there is a fresh answer, this will make it easier to truncate the file.
Examine the response text to see whether it matches the cached response.
9. Now that we have the answer text as well as the prior one, we may compare both.
We shut the file handle and return 0 if the text in both replies matches. As previously stated, a value of 0 indicates that the replies are unchanged and no modifications have happened.
The final portion of our Raspberry Pi's website monitor's "has website changed" function is this and the else line.
If the new response is different, cache it.
If the replies don't match, our Raspberry Pi has identified a change while monitoring the page.
To begin, we truncate the file to its present location. Which should be position 0 at the time of running?
We write the updated response to the file after the file has been truncated. We can close the file handle after the writing is finished because it is no longer required.
We return 1 to indicate that the response has changed.
main() function creation
The main function for the next phase of our Raspberry Pi website monitor must be written. Whenever the script is executed, this function will be invoked.
This section of the script will be simple and will primarily deal with invoking the function we constructed.
11. First, let's define the main function.
The brains of our Raspberry Pi's website monitoring will be housed within this function.
Examining the Website for Changes
12. The "has website changed()" method can now be used. The first and second parameters will be sent into this method using the "sys" package.
The website URL will be the first parameter. The name of the cache file will be the second.
The result returned by this method is saved in the variable "website status."
Response to printing based on website status
13. Now that we have the website status stored in our variable, we can use it to print a message.
This is the last section of our Raspberry Pi website monitoring script. We'll be able to extend this feature to send an email or a text message.
This is simple if, elif statement that produces a different message based on the response.
14. Finally, we can complete our script by adding the call that, when executed, will call the main function.
This simple if statement guarantees that the script was called instead of being imported as a Python module.
The Basic Code in its Final Form
15. We can now save the script and run it through its paces. The completed code should like the example below.
If you want to use the nano text editor, save your work by hitting CTRL + X, then Y, then ENTER.
On our Raspberry Pi, we're putting the Basic Website Monitor to the test.
We can now execute our website monitor software on our Raspberry Pi now that we've developed it. The actions below will assist us in ensuring that the script is functioning properly.
1. We must first ensure that our website monitor script has execution permissions.
Running the following command on the Raspberry Pi will provide this permission to the web monitoring script.
2. We can now run the script because it has the proper permissions.
You'll need to know the URL you want to monitor and the name you want to use for the cache when executing this script.
We'll use "https://pimylifeup.com/" as our monitor URL and "pimylifeup" as our cache name in this example.
3. Use the ls command in the terminal to check that the website monitor built the cache file.
A file with the extension "_cache.txt" should appear. "pimylifeup cache.txt" was one example.
4. If you restart this script, you could discover a problem right away. Because certain websites dynamically set information, each request may be unique even if the content hasn't changed.
If you use our website, for example, you will see that our script will always label it as "modified" using the current code.
The next part will illustrate how we can use Python's lovely soup to clean up the result and remove anything that would cause the website to be listed as "modified" unnecessarily.
Using beautifulsoup to improve the Raspberry Pi Website Monitor
In this part, we'll use beautifulsoup to improve how our Raspberry Pi watches webpages.
Beautifulsoup is a robust Python library that lets us effortlessly edit HTML text. For example, we may use this to remove unnecessary stuff like style and script tags.
1. You'll need to change the script we made before for this section.
To begin, we'll add a new import to the script's top. The BeautifulSoup module from the bs4 library will be imported.
Creating a New cleanup html() Method
We must now begin our new function. This method will be used to clean up any HTML returned by the requests package.
This will assist our Raspberry Pi display more consistent behaviour when monitoring webpages.
2. Add the following line to the file to define this new function.
We'll supply the HTML text for the function to parse as a single parameter to this function.
Create a new BeautifulSoup object.
3. We'll make a new BeautifulSoup object here. We provide in the HTML string that we wish to clean up in the first argument.
The second option specifies the parser that will handle our HTML. We choose lxml because it is quick and provides all of the capabilities we want.
When running something like a website monitor on a Raspberry Pi, when resources are restricted, faster and more efficient code is always a benefit.
Using BeautifulSoup to clean up the HTML
4. We can now put BeautifulSoup to work by parsing HTML text and removing particular tags.
We parse and delete "script", "style", and "meta" tags using some for loops and BeautfulSoup's "select" function.
You'll see that we utilize the ".extract()" method in each loop. This function removes the HTML element that was discovered.
The BeautifulSoup object is returned as a String.
5. Finally, we may return BeautifulSoup after it has done processing the HTML our website monitor script obtained.
You can't just return the soup object in its current state. To convert it to a standard string, we need to utilize the "str()" function.
Cleaning up the HTML Response that was retrieved
We need to change another portion of the script to use our "cleanup html()" method now that we have it.
6. Locate and change the following code line.
This alters the script such that the answer text is no longer stored aimlessly and is instead sent via our new function.
Look for the line of code below. It should be in the method "has website changed()."
Substitute the following for that line.
The New Extended Website Monitor is being saved.
7. After you've made all of the above changes to the file, the code should look like this.
If you're using nano, make sure you save the file by clicking CTRL + X, then Y, then ENTER.
8. Now you may run the script again. The findings should be much more consistent this time. We should limit the odds of a false positive by deleting the "script," "style," and "meta" tags.
Every request should no longer be tagged as "modified" if you're using this example on our website.
The Raspberry Pi Website Monitor now has email support.
If our Raspberry Pi doesn't provide you some form of signal when a website changes, it's not much use.
We'll improve our capabilities in this part by sending an email anytime the script detects a change. Please note that you will need to know the SMTP information for whoever is processing your emails.
We'll use Gmail's SMTP server information as an example.
Importing a New File
We need to load another package to execute SMTP connections in Python. This library is fortunately bundled with Python.
1. Add the following line at the head of the script's imports list.
This line imports the "smtplib" package, allowing us to quickly initiate SMTP connections.
Defining Constants for Email Data Storage
At the start of the Python script, we must specify certain constants. Under the other "import" declarations, the following lines must be added.
While Python does not support constants, we can enhance by using capital letters to name these variables. These values should not be changed during runtime.
SMTP USER \s
2. The login username for your SMTP connection is defined by this constant. If you use Gmail, this is the email you'll use to log in.
The SMTP connection is made using the value contained in this constant.
SMTP PASSWORD
3. You must define the password for the account that is performing the SMTP connection within this constant.
This is your Gmail account password if you use it. You'll need to establish an application password if you have 2FA enabled (which you should).
SMTP HOST
4. The "SMTP HOST" constant specifies the IP address or URL to use for the SMTP connection.
We'll use Gmail's SMTP connection data as an example.
SMTP PORT
5. We define the port that our Raspberry Pi website monitor will use to send an email when it detects a change using this variable.
We'll use Gmail's port for implicit SSL in the example below (port 465).
SMTP SSL
6. These days, most email systems offer SSL or TLS. Only SSL and not STARTTLS will be supported in our code.
Make sure the following constant is set to True to enable this support. Set this value to False if you wish to deactivate SSL.
SMTP FROM EMAIL
7. Finally, we may specify the email address that will be used to send this email. This must be a personal email address.
This must be an email connected with your Gmail account, for example. You must have that address and domain name configured if you use a transactional mail service like Mailgun.
SMTP TO EMAIL
8. The final variable to set is the one that specifies where the script should send the email.
Fill in the email address you wish to receive updates about changes to the website.
Our email notification() function has been written.
We can now write our "email notification()" function now that we've declared all of the relevant constants. This function is in charge of setting up the SMTP server and sending the email.
9. First, let's define our new function. There will be two parameters in it. We can easily specify the subject line with the first argument, and the message itself with the second.
Establishing an SMTP Connection
We begin this procedure by establishing an SMTP connection. We need two distinct calls separated by an if statement to support SSL and unencrypted connections.
If "SMTP SSL" is set to True, an SMTP connection is established and stored in the "smtp server" variable.
When SSL is deactivated, we also do something similar. When setting up the connection, we use the SMTP HOST and SMTP PORT variables.
Accessing the SMTP Server
11. We can now send an email because we've established a connection to the SMTP server.
The first thing we'll do is send the server a "ehlo" message. This tells the server a variety of things, but we won't go into detail.
The next step is to send a login message to the server. We send the user and password saved in "SMTP USER" and "SMTP PASSWORD" to this login method.
Creating an Email Format
12. Next, we'll build the email that will be sent over the SMTP connection.
This is one of the most important components since that is how our Raspberry Pi will inform you when a website update is detected.
You can change the format to your liking. Just make sure the "FROM," "TO," and "Subject" lines are formatted correctly. Each one has its own line.
Before beginning the body of your email, there must be a single blank line.
Email transmission
13. We finish this function by sending an email over our SMTP connection.
We pass the email saved in "SMTP FROM EMAIL," "SMTP TO EMAIL," and lastly the email message we prepared before into this method call.
After the email has been sent, we disconnect the SMTP connection.
Calls to the email notification() function are being added.
Now that we've developed our needed function, we need to call it from within the code.
We'll make adjustments to the "main()" function in this section.
14. In your Python script, locate the following line.
Above it, add the line shown below.
When the Raspberry Pi website monitor encounters a problem when connecting to the website, the script will send an email.
15. We should additionally include a line so that we are alerted when the webpage is updated.
Within your script, locate the following line. It should be directly beneath the previous line you discovered.
Above that, add the following line. When the webpage is recognised as having changed, this line will send an email.
The Code in Its Final Form
16. Your script's code should now look like the one below once you've done all of the modifications.
Before going any further, remember to save your code.
17. Your website monitor should now send you email notifications if everything is operating properly.
It will send an email whenever it is launched and detects a change in the webpage. It will also send an email if that website goes down or produces a non-2XX status code.
Periodically run your Raspberry Pi Website Monitor
You'll want to run the Python script to monitor websites on a regular basis now that you've created it.
While you might add an endless loop to the script to have it run indefinitely, we'll use a simple cronjob instead.
1. Make sure our Python script is installed on your Raspberry Pi before continuing.
You may get the script from our GitHub repository if you skip the preceding parts. Simply ensure that the information for your SMTP connection are filled in.
This section of the instructions will presume that the script is located in the home directory of the "pi" user (/home/pi/).
2. We can now modify the crontab to have our script run every minute.
To edit the current user's crontab, use the following command.
When asked which text editor you want to use, we recommend "nano."
3. At the bottom of the file, add the following line.
To point to the URL you wish to monitor, you'll need to tweak the command slightly. You must also give your cache a name. The cache name can be any string; its sole purpose is to distinguish this request from others.
4. Press CTRL + X, then Y, followed by the ENTER key to save the changes to crontab.
5. After that, your Raspberry Pi will start monitoring the specified website every minute.
If the script detects a change in the website's content, it should now inform you.
Conclusion
We've taught you how to set up a simple website monitor on your Raspberry Pi throughout this article.
Every time this Python script is invoked, it will retrieve the most recent version of the supplied URL. The response is then cached to see whether there has been any change since the last request.
Using an external email server, this script may send you an email anytime the URL changes or becomes inaccessible.
Please leave a comment below if you have any problems with this website monitor script.