Getting app store ratings without authentication

Both the Apple iTunes and Google Play app stores provide extensive APIs for accessing all sorts of metrics and analytics about your apps. This is great, but they require authentication–which is a bit of a pain when you’re trying automate.

So I went looking for a quick and dirty way to get current app ratings from public endpoints.

iTunes Connect

As it turns out, Apple provide a “real” API for this:

$ curl -s https://itunes.apple.com/au/lookup?id=<my-app-id> | jq .results[].averageUserRating

Thanks to user chaoscoder on Stack Overflow for this answer, and also to alberto-m for the “country” tip in this reply, for setting me in the right direction.

BIG FAT CAVEAT: Specifying a country implies that you are only getting the ratings for that country. As far as I can tell, there is no unauthenticated way to get the aggregate rating for all countries (other than enumarating all countries in which your app is for sale, and getting them one by one).

Google Play

Sadly, Google do not seem to provide a similar public API. They do make the ratings available on the app’s store page however, and it appears to be wrapped in a div tag with a distinct aria-label attribute value:

<div class="BHMmbe" aria-label="Rated 3.5 stars out of five stars">3.5</div>

We can use this fact, and the fact the page is well formatted/validated HTML, to apply an XPath query to extract the rating:

$ curl -s 'https://play.google.com/store/apps/details?id=<my-app-id>&hl=en' | \
      xmllint --nowarning --html --xpath '//div[starts-with(@aria-label, "Rated")]/text()' - 2>/dev/null

EVEN BIGGER FATTER CAVEAT: this is super fragile and makes all sorts of assumptions about how the Play store renders its web pages, is completely unauthorised by Google, and could break at any time. YMMV etc.

Wrapping it up in a script

We can put all that together in a bit of Python that can be run as a scheduled job, and maybe ship its logs to Splunk for later analysis:

#!/usr/bin/env python

import sys, urllib, json, logging
from lxml import etree

apple_app_id = "my-app-id"
google_app_id = "my-app-id"

def setup_custom_logger(name):
    formatter = logging.Formatter(fmt="%(asctime)s [%(levelname)s] %(pathname)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S")
    handler = logging.StreamHandler(stream=sys.stdout)
    handler.setFormatter(formatter)
    logger = logging.getLogger(name)
    logger.setLevel(logging.DEBUG)
    logger.addHandler(handler)
    return logger

def get_itunes_rating(app_id):
    response = urllib.urlopen("https://itunes.apple.com/au/lookup?id=%s" % app_id)
    try:
        data = json.loads(response)
        return data["results"][0]["averageUserRating"]
    except (TypeError, ValueError, KeyError, IndexError):
        logger.warn("Response did not contain expected JSON: %s" % response)
        return ""

def get_google_play_rating(app_id):
    response = urllib.urlopen("https://play.google.com/store/apps/details?id=%s&hl=en" % app_id)
    try:
        data = etree.HTML(response)
        return data.xpath("//div[starts-with(@aria-label, \"Rated\")]/text()")[0]
    except (ValueError, IndexError):
        logger.warn("Response did not contain expected XML: %s" % response)
        return ""

if __name__ == '__main__':
    logger = setup_custom_logger("get-app-ratings.py")
    logger.info("apple_store_rating=%s" % get_itunes_rating(apple_app_id))
    logger.info("google_store_rating=%s" % get_google_play_rating(google_app_id))

which produces output like:

$ ./get-app-ratings.py
2019-04-08 17:19:57 [INFO] ./get-app-ratings.py apple_store_rating=3.5
2019-04-08 17:19:58 [INFO] ./get-app-ratings.py google_store_rating=3.5

Getting that stdout output into Splunk is left as an exercise for the reader…