Skip to content

Marina Mele's site

Reflections on family, values, and personal growth

Menu
  • Home
  • About
Menu

Use the Google Analytics API with Python

Posted on October 7, 2014June 1, 2015 by Marina Mele

Python and Google AnalyticsIn this post you will learn to create a Python script that allows users to enter its Google Analytics account and get information from there.

In order to do that, we will create a Project in the Google Developers Console and authorize it to use the Analytics API.

Next, we will use the Oauth 2.0 protocol to allow users to connect to their Analytics account through our Project.

And finally, we will retrieve the number of sessions of our view, segmented by traffic source.

Let’s start!

Create a Project in Google Developers Console

Go to the Google Developers Console and login with your account.

Click on Create Project and write your Project name and choose (if you want) your project ID.

Next, on your new project menu, go to APIs & auth –> Credentials. Here, in the Oauth section, click on Create new Client ID.

In this case, as we are creating a script that will run on our computer, we will choose Installed application as the application type, and Other as the installed application type.

Create a Project for Google Analytics API

Finally, click on Create Client ID.

You will see, next to the OAuth section, the credentials for your project, which contain your Client ID, the Client Secret, and the redirect URIS. Click on Download JSON to download them, and save the file as client_secrets.json.

From here, go to APIs & auth –> Consent screen and personalize the message that your users will see when requesting access to their accounts.

Next, we need to activate the Goolge Analytics API in your Project. Go to APIs & auth –> APIs and look for the Analytics API. You just need to activate it by clicking at the OFF button on the right.

Ok! now that we have our Project created we can move on to our Python script!

The Google API Python client library

In order to use the Analytics API with Python, we will use the Google API Python Client library. You can install it in your working environment using pip (how? learn to install Python, virtualenv and virtualenvwrapper to work with virtual environments).

$ pip install python-gflags
$ pip install -U google-api-python-client

We also install the python-gflags library, which we will use latter in the code.

Next create the file analytics_service_object.py in your working directory (where client_secrets.json is located). This file will create an authorized Analytics Service object, used to interact with the user’s analytics accounts.

import httplib2
from apiclient.discovery import build
from oauth2client.client import flow_from_clientsecrets
from oauth2client.file import Storage
from oauth2client import tools
import argparse

CLIENT_SECRETS = 'client_secrets.json'

# The Flow object to be used if we need to authenticate.
FLOW = flow_from_clientsecrets(
    CLIENT_SECRETS,
    scope='https://www.googleapis.com/auth/analytics.readonly',
    message='%s is missing' % CLIENT_SECRETS
    )

# A file to store the access token
TOKEN_FILE_NAME = 'credentials.dat'


def prepare_credentials():
    parser = argparse.ArgumentParser(parents=[tools.argparser])
    flags = parser.parse_args()
    # Retrieve existing credendials
    storage = Storage(TOKEN_FILE_NAME)
    credentials = storage.get()
    # If no credentials exist, we create new ones
    if credentials is None or credentials.invalid:
        credentials = tools.run_flow(FLOW, storage, flags)
    return credentials


def initialize_service():
    # Creates an http object and authorize it using
    # the function prepare_creadentials()
    http = httplib2.Http()
    credentials = prepare_credentials()
    http = credentials.authorize(http)
    # Build the Analytics Service Object with the authorized http object
    return build('analytics', 'v3', http=http)

if __name__ == '__main__':
    service = initialize_service()

In the previous script:

  • CLIENT_SECRETS loads your credentials from the client_secrets.json file.
  • TOKEN_FILE_NAME is the file where the user-specific credentials will be stored (this file also includes some project-specific credentials).
  • the prepare_credentials() function tries to load the credentials from the TOKEN_FILE_NAME and if they don’t exist it creates new ones using the run_flow function.
  • the initialize_service() function uses the credentials to build an authorized Analytics Service object, and returns this object.

Now, when you type

$ python analytics_service_object.py

you will see, in a browser window, the consent screen you customized before. This means that your Project is asking your permission to access your Analytics account through the API. After clicking yes, your new credentials will be stored in TOKEN_FILE_NAME so that you won’t have to enter them again (except when the access_token expires).

The Analytics Service object

Once we have an authorized Analytics service object, we can use it to retrieve all the data in the user’s analytics accounts.

For example, to get a list of all the existing accounts of the user, just type:

accounts = service.management().accounts().list().execute()

This will give you a dictionary containing the following keys:

  • username: the email address of the user
  • kind: analytics#accounts
  • items: a list of the user’s accounts.
  • totalResults
  • itemsPerPage
  • startIndex

As we will see, this is a common structure when getting data from analytics, even when we ask for properties or views instead of accounts (the returned object has the same keys).

Moreover, the items value is a list of accounts, each of which is in turn a dictionary with keys:

  • id: your account id
  • kind: analytics#account
  • childLink
  • created
  • permissions
  • selfLink
  • updated

Therefore, you can get a list of your users accounts with:

def get_accounts_ids(service):
    accounts = service.management().accounts().list().execute()
    ids = []
    if accounts.get('items'):
        for account in accounts['items']:
            ids.append(account['id'])
    return ids

You can also see the account ids in the Google Analytics web. You have to go to the Admin tab, and open the top-left drop down menu. There, your different accounts will be displayed, with their id on the right.

But as you may know, each Account can have multiple Properties, each of which has a different tracking code. To obtain a list of the Properties inside the Account with an id of account_id, you can use:

webproperties = service.management().webproperties().list(
    accountId=account_id).execute()

where webproperties is a dictionary with the same keys as accounts, but in which

  • kind: analytics#webproperties
  • items: list of web properties for this account

Again, each web property is a dictionary that contains the keys:

  • id: the web property id
  • kind: analytics#webproperty

and many more (you can print the webproperties object to see its keys).

You’ll see that the web property id is the tracking code of this property, which you can also obtain in the Google Analytics Admin tab.

But there is another level! Inside each Property there can be multiple views! You can obtain a list of views (or profiles) of each web property with:

profiles = service.management().profiles().list(
    accountId=firstAccountId,
    webPropertyId=firstWebpropertyId).execute()

The profiles dictionary contains the same keys as accounts and webproperties, but with

  • kind: analytics#profiles
  • items: list of profiles for this account and web property

and each profile has:

  • id: the profile id
  • kind: analytics#profile
  • name: the profile name

Get the number of Sessions of a Google Analytics View

Now that we know how to get information about our accounts, properties and views, let’s obtain the number of sessions of a view during a period of time.

Create the file get_sessions.py and write:

from analytics_service_object import initialize_service


def get_sessions(service, profile_id, start_date, end_date):
    ids = "ga:" + profile_id
    metrics = "ga:sessions"
    data = service.data().ga().get(
        ids=ids, start_date=start_date, end_date=end_date, metrics=metrics
        ).execute()
    return data["totalsForAllResults"][metrics]


if __name__ == '__main__':
    service = initialize_service()
    profile_id = "your_profile_id"
    print get_sessions(service, profile_id, "2014-09-01", "2014-09-30")

Note: you have to add your view id in “your_profile_id”, and then, run this script with:

$ python get_sessions.py

Check all the functionalities of the service.data().ga().get() method, and retrieve all the data you want form your view!

Get the number of Sessions for each traffic source

Obtaining the number of sessions for each traffic source (i.e. organic, referral, social, direct, email and other) is a little bit trickier. You have to work with filters in order to segment your data.

Here’s a little script that does this, thanks to Michael for the update 🙂

from analytics_service_object import initialize_service

def get_source_group(service, profile_id, start_date, end_date):
    ids = "ga:" + profile_id
    metrics = "ga:sessions"
    dimensions = "ga:channelGrouping"
    data = service.data().ga().get(
        ids=ids, start_date=start_date, end_date=end_date, metrics=metrics,
        dimensions=dimensions).execute()
    return dict(
        data["rows"] + [["total", data["totalsForAllResults"][metrics]]])


if __name__ == '__main__':
    service = initialize_service()
    profile_id = "your_profile_id"
    start_date = "2014-09-01"
    end_date = "2014-09-30"
    data = get_source_group(service, profile_id, start_date, end_date)
    for key, value in data.iteritems():
        print key, value

Again, add your view’s id in “your_profile_id”, and change the start_date and end_date to match the time interval you want.

After running this script, you’ll see the desired information in your terminal.

Another solution to get the number of sessions by traffic source, less optimized but instructive, is to use filter instead of dimensions:

from analytics_service_object import initialize_service


not_source_filters = {
    "social": "ga:hasSocialSourceReferral==No",
    "organic": "ga:medium!=organic",
    "direct":  "ga:source!=(direct),ga:medium!=(none);ga:medium!=(not set)",
    "email": "ga:medium!=email",
    "referral": "ga:medium!=referral,ga:hasSocialSourceReferral!=No"
}

source_filters = {
    "social": "ga:hasSocialSourceReferral==Yes",
    "organic": "ga:medium==organic",
    "direct":  "ga:source==(direct);ga:medium==(none),ga:medium==(not set)",
    "email": "ga:medium==email",
    "referral": "ga:medium==referral;ga:hasSocialSourceReferral==No",
    "other": "%s;%s;%s;%s;%s" % (
        not_source_filters["social"], not_source_filters["organic"],
        not_source_filters["direct"], not_source_filters["email"],
        not_source_filters["referral"])
}


def get_source_sessions(service, profile_id, start_date, end_date, source):
    ids = "ga:" + profile_id
    metrics = "ga:sessions"
    filters = source_filters[source]
    data = service.data().ga().get(
        ids=ids, start_date=start_date, end_date=end_date, metrics=metrics,
        filters=filters).execute()
    return data["totalsForAllResults"][metrics]


if __name__ == '__main__':
    service = initialize_service()
    profile_id = "your_profile_id"
    start_date = "2014-09-01"
    end_date = "2014-09-30"
    for source in ["social", "organic", "direct", "email", "referral", "other"]:
        print source, get_source_sessions(
            service, profile_id, start_date, end_date, source)

Again, add your view’s id in “your_profile_id”, and change the start_date and end_date to match the time interval you want.

After running this script, you’ll see the desired information in your terminal.

Some information you may find useful when working with filters:

  • , means OR
  • ; means AND
  • == means exact match
  • != means does not match
  • =@ means contains substring
  • !@ means does not contain substring
  • learn more in the Google Reference Guide

That’s all for today! 🙂

Please, +1 if was useful and share it with your friends! Thaaanks!

Marina Melé
Marina Mele

Marina Mele has experience in artificial intelligence implementation and has led tech teams for over a decade. On her personal blog (marinamele.com), she writes about personal growth, family values, AI, and other topics she’s passionate about. Marina also publishes a weekly AI newsletter featuring the latest advancements and innovations in the field (marinamele.substack.com)

Leave a Reply Cancel reply

You must be logged in to post a comment.

Categories

  • Personal Growth and Development
  • Artificial Intelligence
  • Mindful Parenting and Family Life
  • Productivity and Time Management
  • Mindfulness and Wellness
  • Values and Life Lessons
  • Posts en català
  • Other things to learn

Recent Posts

  • Understanding Frustration in Children
  • What is ChatGPT and how it compares to Bard and Claude
  • BlueSky Social – A Sneak Peek at the Future of Social Media
  • The Incredible Journey of AI Image Generation
  • AI and Fundamental Rights: How the AI Act Aims to Protect Individuals

RSS

  • Entries RSS
Follow @marina_mele
  • Cookie Policy
  • Privacy Policy
©2025 Marina Mele's site | Built using WordPress and Responsive Blogily theme by Superb
This website uses cookies to improve your experience. If you keep navigating through this website, we'll assume you're ok with this, but you can opt-out if you wish.Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT