If you are a frequent reader of this blog, you might already know that I created a small tool to generate a simple webpage plus an RSS feed, from the content of multiple other RSS sources, called worker-planet
.
This type of tool is often known as a “planet”:
In online media a planet is a feed aggregator application designed to collect posts from the weblogs of members of an internet community and display them on a single page.
Wikipedia
While the tool is open-source, a person needs to deploy it before being able to see it in action. Not great.
This brings us to last week. I was reading a recent issue of a popular newsletter, when I found an OPML file containing 101 infosec related sources curated by someone else.
Instead of adding them to my newsreader, which to be honest, already contains a lot of cruft that I never read and that I should remove anyway, I saw a great fit to build a demo site for `worker-planet`.
Preparing the sources
The first step was to extract all the valid sources from that file. This is important because there is the chance that many of the items might not be working or online at all, since the file is more than 2 years old.
A quick python script can help us with this task:
# Extract existing URLs
urls = []
tree = ET.parse(opml_file)
for element in tree.getroot().iter("outline"):
if url := element.get("xmlUrl"):
urls.append(url)
# Make sure they are working
def check_feed(url):
try:
response = urlopen(url)
if 200 <= response.status < 300:
body = response.read().decode("utf-8")
ET.fromstring(body)
return url
except Exception:
pass
working_urls = []
with ThreadPoolExecutor(max_workers=20) as executor:
for result in executor.map(check_feed, urls):
if result:
working_urls.append(result)
As expected, from the 101 sources present in the file, only 54 seem to be working.
Deploying
Now that we already have the inputs we need, it is time to set up and deploy our worker-planet
.
Assuming there aren’t any customizations, we just have to copy the wrangler.toml.example
to a new wrangler.toml
file and fill configs as desired. Here’s the one I used:
name = "infosecplanet"
main = "./worker/script.js"
compatibility_date = "2023-05-18"
node_compat = true
account_id = "<my_id>"
workers_dev = true
kv_namespaces = [
{ binding = "WORKER_PLANET_STORE", id = "<namespace_id_for_prod>", preview_id = "<namespace_id_for_dev"> },
]
[vars]
FEEDS = "<all the feed urls>"
MAX_SIZE = 100
TITLE = "InfoSec Planet"
DESCRIPTION = "A collection of diverse security content from a curated list of sources. This website also serves as a demo for \"worker-planet\", the software that powers it."
CUSTOM_URL = "https://infosecplanet.ovalerio.net"
CACHE_MAX_AGE = "300"
[triggers]
crons = ["0 */2 * * *"]
Then npm run build
plus npm run deploy
. And it is done, the new planet should now accessible through my workers.dev
subdomain.
The rest is waiting for the cron job to execute and also configure any custom routes / domains on Cloudflare’s dashboard.
The final result
The new “Infosec Planet” is available on “https://infosecplanet.ovalerio.net” and lists the latest content in those infosec related sources. A united RSS feed is also available.
In the coming weeks, I will likely improve a bit the list of sources to improve the overall quality of the content.
One thing I would like to highlight, is that I took a special precaution to not include the full content of the feeds in the InfoSec Planet’s output.
It was done this way because I didn’t ask for permission from all those authors, to include the contents of their public feeds in the page. So just a small snippet is shown together with the title.
Nevertheless, if some author wishes to remove their public feed from the page, I will gladly do it so once notified (by email?).