Script: Mass-Download WordPress plugins

After seeing this advisory about an issue that should really have been caught by a cursory source code check, I wanted to see if there are any other low-handing fruits in WordPress plugins.

Instead of my normal process of a mixture of black box testing and proper source code analysis, I decided to download a large number of WordPress plugins and run a couple of grep searches against them, without any deeper analysis.

I used the following script, which might be of interest to others who want to perform automated source code audits on a large number of WordPress plugins.

# usage: python wordpress_mass_plugin_downloader.py > wordpress_plugins_download_links.txt | wget -i wordpress_plugins_download_links.txt

import sys
import re
import argparse
import requests # requires requests lib

startPage = 1
pages = 10

requestSession = requests.session()

# get links to individual plugin pages
pluginNames = []
for x in range(startPage, pages):
    # all: 
    response = requestSession.get("https://wordpress.org/plugins/browse/popular/page/" + str(x) + "/").text

    # for specific keyword:
    # response = requestSession.get("https://wordpress.org/plugins/search/[keyword]/page/" + str(x) + "/").text

    pluginNames = pluginNames + re.findall("<a href=\"https://wordpress.org/plugins/(.*?)\" rel=\"bookmark\">", response)

uniquePluginNames = list(set(pluginNames))

# visit each plugin page to get download link
links = []
for pluginName in uniquePluginNames:
    try:
        response = requestSession.get("https://wordpress.org/plugins/" + pluginName + "/").text
        print re.search("<a class=\"plugin-download button download-button button-large\" href=\"(.*?)\">Download</a>", response, re.DOTALL).group(1)
    except:
        print "error"

It’s a quick and dirty script that uses regex to parse HTML, but it does the job.