In this post you will learn to create a Python script that allows users to enter its Google Analytics account and get information from there.
In order to do that, we will create a Project in the Google Developers Console and authorize it to use the Analytics API.
Next, we will use the Oauth 2.0 protocol to allow users to connect to their Analytics account through our Project.
And finally, we will retrieve the number of sessions of our view, segmented by traffic source.
Let’s start!
Create a Project in Google Developers Console
Go to the Google Developers Console and login with your account.
Click on Create Project and write your Project name and choose (if you want) your project ID.
Next, on your new project menu, go to APIs & auth –> Credentials. Here, in the Oauth section, click on Create new Client ID.
In this case, as we are creating a script that will run on our computer, we will choose Installed application as the application type, and Other as the installed application type.
Finally, click on Create Client ID.
You will see, next to the OAuth section, the credentials for your project, which contain your Client ID, the Client Secret, and the redirect URIS. Click on Download JSON to download them, and save the file as client_secrets.json.
From here, go to APIs & auth –> Consent screen and personalize the message that your users will see when requesting access to their accounts.
Next, we need to activate the Goolge Analytics API in your Project. Go to APIs & auth –> APIs and look for the Analytics API. You just need to activate it by clicking at the OFF button on the right.
Ok! now that we have our Project created we can move on to our Python script!
The Google API Python client library
In order to use the Analytics API with Python, we will use the Google API Python Client library. You can install it in your working environment using pip (how? learn to install Python, virtualenv and virtualenvwrapper to work with virtual environments).
$ pip install python-gflags $ pip install -U google-api-python-client
We also install the python-gflags library, which we will use latter in the code.
Next create the file analytics_service_object.py in your working directory (where client_secrets.json is located). This file will create an authorized Analytics Service object, used to interact with the user’s analytics accounts.
import httplib2 from apiclient.discovery import build from oauth2client.client import flow_from_clientsecrets from oauth2client.file import Storage from oauth2client import tools import argparse CLIENT_SECRETS = 'client_secrets.json' # The Flow object to be used if we need to authenticate. FLOW = flow_from_clientsecrets( CLIENT_SECRETS, scope='https://www.googleapis.com/auth/analytics.readonly', message='%s is missing' % CLIENT_SECRETS ) # A file to store the access token TOKEN_FILE_NAME = 'credentials.dat' def prepare_credentials(): parser = argparse.ArgumentParser(parents=[tools.argparser]) flags = parser.parse_args() # Retrieve existing credendials storage = Storage(TOKEN_FILE_NAME) credentials = storage.get() # If no credentials exist, we create new ones if credentials is None or credentials.invalid: credentials = tools.run_flow(FLOW, storage, flags) return credentials def initialize_service(): # Creates an http object and authorize it using # the function prepare_creadentials() http = httplib2.Http() credentials = prepare_credentials() http = credentials.authorize(http) # Build the Analytics Service Object with the authorized http object return build('analytics', 'v3', http=http) if __name__ == '__main__': service = initialize_service()
In the previous script:
- CLIENT_SECRETS loads your credentials from the client_secrets.json file.
- TOKEN_FILE_NAME is the file where the user-specific credentials will be stored (this file also includes some project-specific credentials).
- the prepare_credentials() function tries to load the credentials from the TOKEN_FILE_NAME and if they don’t exist it creates new ones using the run_flow function.
- the initialize_service() function uses the credentials to build an authorized Analytics Service object, and returns this object.
Now, when you type
$ python analytics_service_object.py
you will see, in a browser window, the consent screen you customized before. This means that your Project is asking your permission to access your Analytics account through the API. After clicking yes, your new credentials will be stored in TOKEN_FILE_NAME so that you won’t have to enter them again (except when the access_token expires).
The Analytics Service object
Once we have an authorized Analytics service object, we can use it to retrieve all the data in the user’s analytics accounts.
For example, to get a list of all the existing accounts of the user, just type:
accounts = service.management().accounts().list().execute()
This will give you a dictionary containing the following keys:
- username: the email address of the user
- kind: analytics#accounts
- items: a list of the user’s accounts.
- totalResults
- itemsPerPage
- startIndex
As we will see, this is a common structure when getting data from analytics, even when we ask for properties or views instead of accounts (the returned object has the same keys).
Moreover, the items value is a list of accounts, each of which is in turn a dictionary with keys:
- id: your account id
- kind: analytics#account
- childLink
- created
- permissions
- selfLink
- updated
Therefore, you can get a list of your users accounts with:
def get_accounts_ids(service): accounts = service.management().accounts().list().execute() ids = [] if accounts.get('items'): for account in accounts['items']: ids.append(account['id']) return ids
You can also see the account ids in the Google Analytics web. You have to go to the Admin tab, and open the top-left drop down menu. There, your different accounts will be displayed, with their id on the right.
But as you may know, each Account can have multiple Properties, each of which has a different tracking code. To obtain a list of the Properties inside the Account with an id of account_id, you can use:
webproperties = service.management().webproperties().list( accountId=account_id).execute()
where webproperties is a dictionary with the same keys as accounts, but in which
- kind: analytics#webproperties
- items: list of web properties for this account
Again, each web property is a dictionary that contains the keys:
- id: the web property id
- kind: analytics#webproperty
and many more (you can print the webproperties object to see its keys).
You’ll see that the web property id is the tracking code of this property, which you can also obtain in the Google Analytics Admin tab.
But there is another level! Inside each Property there can be multiple views! You can obtain a list of views (or profiles) of each web property with:
profiles = service.management().profiles().list( accountId=firstAccountId, webPropertyId=firstWebpropertyId).execute()
The profiles dictionary contains the same keys as accounts and webproperties, but with
- kind: analytics#profiles
- items: list of profiles for this account and web property
and each profile has:
- id: the profile id
- kind: analytics#profile
- name: the profile name
Get the number of Sessions of a Google Analytics View
Now that we know how to get information about our accounts, properties and views, let’s obtain the number of sessions of a view during a period of time.
Create the file get_sessions.py and write:
from analytics_service_object import initialize_service def get_sessions(service, profile_id, start_date, end_date): ids = "ga:" + profile_id metrics = "ga:sessions" data = service.data().ga().get( ids=ids, start_date=start_date, end_date=end_date, metrics=metrics ).execute() return data["totalsForAllResults"][metrics] if __name__ == '__main__': service = initialize_service() profile_id = "your_profile_id" print get_sessions(service, profile_id, "2014-09-01", "2014-09-30")
Note: you have to add your view id in “your_profile_id”, and then, run this script with:
$ python get_sessions.py
Check all the functionalities of the service.data().ga().get() method, and retrieve all the data you want form your view!
Get the number of Sessions for each traffic source
Obtaining the number of sessions for each traffic source (i.e. organic, referral, social, direct, email and other) is a little bit trickier. You have to work with filters in order to segment your data.
Here’s a little script that does this, thanks to Michael for the update 🙂
from analytics_service_object import initialize_service def get_source_group(service, profile_id, start_date, end_date): ids = "ga:" + profile_id metrics = "ga:sessions" dimensions = "ga:channelGrouping" data = service.data().ga().get( ids=ids, start_date=start_date, end_date=end_date, metrics=metrics, dimensions=dimensions).execute() return dict( data["rows"] + [["total", data["totalsForAllResults"][metrics]]]) if __name__ == '__main__': service = initialize_service() profile_id = "your_profile_id" start_date = "2014-09-01" end_date = "2014-09-30" data = get_source_group(service, profile_id, start_date, end_date) for key, value in data.iteritems(): print key, value
Again, add your view’s id in “your_profile_id”, and change the start_date and end_date to match the time interval you want.
After running this script, you’ll see the desired information in your terminal.
Another solution to get the number of sessions by traffic source, less optimized but instructive, is to use filter instead of dimensions:
from analytics_service_object import initialize_service not_source_filters = { "social": "ga:hasSocialSourceReferral==No", "organic": "ga:medium!=organic", "direct": "ga:source!=(direct),ga:medium!=(none);ga:medium!=(not set)", "email": "ga:medium!=email", "referral": "ga:medium!=referral,ga:hasSocialSourceReferral!=No" } source_filters = { "social": "ga:hasSocialSourceReferral==Yes", "organic": "ga:medium==organic", "direct": "ga:source==(direct);ga:medium==(none),ga:medium==(not set)", "email": "ga:medium==email", "referral": "ga:medium==referral;ga:hasSocialSourceReferral==No", "other": "%s;%s;%s;%s;%s" % ( not_source_filters["social"], not_source_filters["organic"], not_source_filters["direct"], not_source_filters["email"], not_source_filters["referral"]) } def get_source_sessions(service, profile_id, start_date, end_date, source): ids = "ga:" + profile_id metrics = "ga:sessions" filters = source_filters[source] data = service.data().ga().get( ids=ids, start_date=start_date, end_date=end_date, metrics=metrics, filters=filters).execute() return data["totalsForAllResults"][metrics] if __name__ == '__main__': service = initialize_service() profile_id = "your_profile_id" start_date = "2014-09-01" end_date = "2014-09-30" for source in ["social", "organic", "direct", "email", "referral", "other"]: print source, get_source_sessions( service, profile_id, start_date, end_date, source)
Again, add your view’s id in “your_profile_id”, and change the start_date and end_date to match the time interval you want.
After running this script, you’ll see the desired information in your terminal.
Some information you may find useful when working with filters:
- , means OR
- ; means AND
- == means exact match
- != means does not match
- =@ means contains substring
- !@ means does not contain substring
- learn more in the Google Reference Guide
That’s all for today! 🙂
Please, +1 if was useful and share it with your friends! Thaaanks!
Marina Mele has experience in artificial intelligence implementation and has led tech teams for over a decade. On her personal blog (marinamele.com), she writes about personal growth, family values, AI, and other topics she’s passionate about. Marina also publishes a weekly AI newsletter featuring the latest advancements and innovations in the field (marinamele.substack.com)