Getting app store ratings without authentication
Both the Apple iTunes and Google Play app stores provide extensive APIs for accessing all sorts of metrics and analytics about your apps. This is great, but they require authentication–which is a bit of a pain when you’re trying automate.
So I went looking for a quick and dirty way to get current app ratings from public endpoints.
iTunes Connect
As it turns out, Apple provide a “real” API for this:
$ curl -s<my-app-id> | jq .results[].averageUserRating
Thanks to user chaoscoder on Stack Overflow for this answer, and also to alberto-m for the “country” tip in this reply, for setting me in the right direction.
BIG FAT CAVEAT: Specifying a country implies that you are only getting the ratings for that country. As far as I can tell, there is no unauthenticated way to get the aggregate rating for all countries (other than enumarating all countries in which your app is for sale, and getting them one by one).
Google Play
Sadly, Google do not seem to provide a similar public API. They do
make the ratings available on the app’s store page however, and it
appears to be wrapped in a div
tag with a distinct aria-label
attribute value:
<div class="BHMmbe" aria-label="Rated 3.5 stars out of five stars">3.5</div>
We can use this fact, and the fact the page is well formatted/validated HTML, to apply an XPath query to extract the rating:
$ curl -s '<my-app-id>&hl=en' | \
xmllint --nowarning --html --xpath '//div[starts-with(@aria-label, "Rated")]/text()' - 2>/dev/null
EVEN BIGGER FATTER CAVEAT: this is super fragile and makes all sorts of assumptions about how the Play store renders its web pages, is completely unauthorised by Google, and could break at any time. YMMV etc.
Wrapping it up in a script
We can put all that together in a bit of Python that can be run as a scheduled job, and maybe ship its logs to Splunk for later analysis:
#!/usr/bin/env python
import sys, urllib, json, logging
from lxml import etree
apple_app_id = "my-app-id"
google_app_id = "my-app-id"
def setup_custom_logger(name):
formatter = logging.Formatter(fmt="%(asctime)s [%(levelname)s] %(pathname)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S")
handler = logging.StreamHandler(stream=sys.stdout)
logger = logging.getLogger(name)
return logger
def get_itunes_rating(app_id):
response = urllib.urlopen("" % app_id)
data = json.loads(response)
return data["results"][0]["averageUserRating"]
except (TypeError, ValueError, KeyError, IndexError):
logger.warn("Response did not contain expected JSON: %s" % response)
return ""
def get_google_play_rating(app_id):
response = urllib.urlopen("" % app_id)
data = etree.HTML(response)
return data.xpath("//div[starts-with(@aria-label, \"Rated\")]/text()")[0]
except (ValueError, IndexError):
logger.warn("Response did not contain expected XML: %s" % response)
return ""
if __name__ == '__main__':
logger = setup_custom_logger("")"apple_store_rating=%s" % get_itunes_rating(apple_app_id))"google_store_rating=%s" % get_google_play_rating(google_app_id))
which produces output like:
$ ./
2019-04-08 17:19:57 [INFO] ./ apple_store_rating=3.5
2019-04-08 17:19:58 [INFO] ./ google_store_rating=3.5
Getting that stdout output into Splunk is left as an exercise for the reader…