This article is a guest post by the amazing Cretu Razvan. He’s working as data processor and wrote this blog post through the Write for the Community program.
Hi there! Today I will show you how to get data from a Strapi API, analyze the data you got and in the end, push the summary data back to Strapi. All these steps will be done with the help of python and its awesome packages that will ease our work.
Let's suppose the following scenario:
You are an experienced data analyst who has just been employed by an E-Commerce Business that sells various products through their online shop made with Strapi as their CMS. They asked you to help them out get more insights about their business performance and revenues.
You will need to perform multiple tasks like accessing the shop's API to get the data, clean it, perform mutations needed to get insights and also push the data back to the database that the shop uses.
At the end of this tutorial, you should be able to easily integrate Strapi's API with python and all of its data analysis and visualizations packages available on the internet. You could flawlessly build python scripts that will perform automated plots and get your data locally to any format you would like by simply running a single command in your terminal.
To keep this tutorial simple I won't be covering how to set up your code to run it from the terminal, but you can check this tutorial right here on how to run your script from your terminal of choice.
If you are a complete beginner to Strapi, APIs or web development make sure to check out this article written by Maxime to get to know more concepts about APIs and how to integrate Strapi with everything.
Strapi
To follow along with this tutorial you will need to have a Strapi project which is already up and running. It can be an eCommerce, a blog, an inventory app or any other project you can think of.
If you don't have one already, don't worry, it will work flawlessly even with a local project. Let's build one quickly.
Using the terminal of your choice, navigate to the directory where you want your project to be stored and simply run the following line:
yarn create strapi-app my-app --quickstart
It will bootstrap an application with all you need to get started with Strapi, read more here about the installation.
Once the installation is completed go to http://localhost:1337/admin you will be prompted to sign up to create the first admin user for your project and to log in into the administration panel.
Collection-Types
For the first step, you'll need to create two Collection Types (CT) from the Content-Types Builder. You'll need the following CTs along with their fields:
Once you managed to create your Collection-Types go under Settings > Administration Panel > Roles and make sure to check the boxes on the left of the Collection-Types. Once a user with Editor role logs into the administration panel, it will be able to see on the sidebar just the Collection-Types he has rights for.
For this tutorial I'll be using the Editor role which is really close to a custom "Data Analyst" role and the perfect fit for this project, it will let you see all the data created, even by others.
To make authenticated requests make sure to set the Permissions under Settings > Users & Permissions plugin > Roles > Authenticated and check the boxes for each action you want to allow for your Collection Types. They should look similar to this:
In a real-life situation most of the time this data will come from a third-party service like Stripe or Shopify or any other eCommerce service a business uses and it might also contain much more different fields. For this article, we'll stick to the data I have created. It will be far than enough for you to understand the process of analyzing your Strapi data with python.
Anaconda and python
Anaconda is the most loved and powerful platform for data scientists. Either if you have python installed or no, you don't have to worry as it already comes along with the Anaconda distribution you'll be installing. You can download it from here.
Also, make sure you read more on how to install Anaconda. Once you have completed the installation you are ready to start on analyzing your Strapi data.
Anaconda will also install by default Jupyter Notebook (and other useful data analysis tools) which we will be using to do our analysis tasks. Hit the Launch button under Jupyter Notebook and it should pop up a browser window on http://localhost:8888
Create a new notebook with python 3 by clicking the new button on the page. It should open another browser tab like so:
Great, now that we are done with the setup, let's begin!
To get your job done you have been granted a personal account to get access to the company's dashboard with all the rights granted based on your role.
As stated after creating the Collection-Types, once you log into the administration panel the sidebar will be different from that of an admin. In the image below you can see that our user can see just the Collection-Types it has access for:
To mimic the job of a data analyst we'll be using an Editor role which can manage just the Sales and Summaries Collection Types.
Get the data
Currently at the time of writing this article Strapi does not support any feature that let you download the data, but there's no need to worry. We will use python to interact with the API and download all the data we need to analyses.
Here's a short description of all the packages we're going to use throughout the entire process:
Let's dive into Jupyter Notebook. One of the first things you'll have to do in your newly created notebook is to import all the libraries we'll be working with. In the first cell of your notebook make sure to run the following lines of code:
1import requests as req
2
3import json
4
5import pandas as pd
6
7from datetime import datetime
8
9import plotly.express as px
10
11from pandas.tseries.offsets import MonthEnd
The next step we will need to log in into the administration dashboard. With the help of requests library, we'll make a request to the following URL: http://localhost:1337/admin/login
Make sure to pass your email and password assigned to you as a json argument to the request method you are using. Your code should look something like so:
1 res = req.post(
2
3 "http://localhost:1337/admin/login",
4
5 data = {
6
7 'email' : 'jhon.doe@gmail.com',
8
9 'password' : '4L4b4l4p0rt0' ,
10
11 'Content-Type' : "application/json"
12
13 }).json()["data"]
If you run the following into a Jupyter Notebook cell it should return the following:
The "token" attribute is what you need to pass as headers to the request method you are using to make an authenticated request to get all the data you need.
Try running the following lines of code:
1sales = req.get(
2
3 "http://localhost:1337/sales",
4
5 headers = {"Authorization": f"Bearer {res['token']}"}
6
7).json()
8
9sales
In terms of python it should return a list of dictionaries, each dictionary representing an entry in the sales table we have in the administration panel:
With the help of Pandas library let's transform the list of dictionaries into a DataFrame for better readability. Run the following lines of code:
1df = pd.DataFrame(sales)
2
3df
Your DataFrame should look something like this:
Cleaning and Manipulations
Now that we got the data from Strapi you can clearly see that the data is a bit messy and hard to read. One of the first tasks you would want to do before starting to analyze any data is to make sure that our data is clean. If you're a complete beginner on cleaning data with python make sure to check this article.
Simply copy the following lines of code and paste them in a new cell into your Jupyter notebook.
1print(df.dtypes) # Display current dtypes for all columns
2
3df = df.set_index('id') # Set the index as the id col from Strapi
4
5df = df.drop(['created_at','updated_at'], axis= 1) # Drop unwanted columns
6
7def format_time(x):
8
9 return datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ")
10
11df['transaction_date'] = df['transaction_date'].apply(format_time) # Apply the custom function created above over each entry in df
Now that our data is clean let's move on to create our summary dataset which we'll be using later to push it back to Strapi.
To do that we will be needing some more columns in our df.
1df['revenue'] = df['amount'] * df['price'] # Revenue generated for each transaction
2
3df['month_year'] = pd.to_datetime(df['date']).dt.strftime('%B %y') # Get each month per year in order to do aggregations on this column.
4
5summary = pd.DataFrame() # Store summary data.
6
7summary['total_sales'] = df.groupby('month_year')['revenue'].sum() # Aggregate the revenue generated for each month.
8
9summary = summary.reset_index()
10
11summary['month_end'] = pd.to_datetime(summary['month_year'], format = '%B %y') + MonthEnd(0) # Use this to push back to Strapi as Date field.
Now that our summary dataset is created we will be using this to perform some visualizations to get better insights from it. We will also use this dataset to push data back to Strapi.
Save your data local
Also, before starting on analyzing the data let's create a local copy of the data. It can be any data format you would like ( e.g. excel, csv, json, sav, etc.. ). I'll be using excel for this tutorial. If you don't have Excel installed on your machine you can simply choose another format which pandas.DataFrame can handle and read its documentation here on how to use it.
I'll save the data under** C:\Data** but it can be anywhere you want on your computer. Run the following line to save your data locally:
summary.to_excel("C:\Data\Summary_data.xlsx", index = False)
Now if you go under C:\Data\ you should be able to see the excel file with all your data from Strapi.
Visualizations
We will be creating our plots with the help of visualizations tool for python like Plotly which is also open-source. Make sure to check out more about this cool library.
Once you imported Plotly at the beginning of your Notebook like so:
import plotly.express as px
You can simply copy the lines below into a new cell:
1fig = px.bar(
2
3 summary,
4
5 x='month_year',
6
7 y='total_sales'
8
9)
10
11fig.update_layout(
12
13 title = 'Time Series with Custom Date-Time Format'
14
15)
16
17fig.show()
Running the cell should generate the following bar plot you see in the images below:
If you would like to generate a line plot simply write the following:
1fig = px.line(
2
3 ...
4
5)
The above plots are pretty straightforward and simplistic. Let's suppose the following question "How much revenue generated each product in each month?".
We can simply answer the above question with a little more complex plot from the initial dataset we got from Strapi. Run the below code lines in your Jupyter Notebook:
1df1 = pd.DataFrame(df.groupby(['month_year','product'])['revenue'].sum()).reset_index()
2
3df1['month_end'] = pd.to_datetime(df1['month_year'], format = '%B %y') + MonthEnd(0)
4
5fig = px.histogram(
6 df1,
7 x='month_end',
8 y="revenue",
9 color='product',
10 title='Total revenue per product at the end of each month',
11 barmode='group',
12 height=500,
13 labels=dict(x="Fruit", y="Amount", color="Place")
14)
15fig.update_xaxes(
16 dtick="M1",
17 tickformat="%b %y",
18 ticklabelmode="period",
19 tickangle = -45,
20 title_text = "Month & Year",
21 title_font = {"size": 15}
22)
23fig.update_yaxes(
24 title_text = "Revenue",
25 title_font = {"size": 15},
26)
27fig.update_layout(legend_title_text='Products')
28
29fig.show()
If everything ran without any errors, you should nicely get a plot like below:
Push back to Strapi
Now that we are done with our visualizations we want to push our summary data back to Strapi. It can also be used to make visualizations on your personal website.
The first thing we will need to do is to transform our summary dataset into a JSON format. Each row in the dataframe will be a JSON object representing an entry in our Summaries table inside Strapi. We will need to make a post request for each entry we have in our summary dataset as currently, Strapi does not support creating multiple entries at once.
Inside a new cell in your jupyter notebook copy/paste the lines of code below:
1result = summary.to_json(orient="records", date_format = 'iso') # save df as json
2
3for data in json.loads(result):
4
5 req.post(
6
7 "http://localhost:1337/summaries",
8
9 data = json.dumps(data),
10
11 headers = {
12
13 "Authorization": f"Bearer {res['token']}",
14
15 "Content-Type": "application/json"}
16
17 ).json()
Once you run the cell and hopefully didn't got any errors you should see your entries in the admin panel like so:
I hope you like this article and it was useful. By the end of this, you should be able to build up an automated workflow of analyzing Strapi data with python. Also, it can be whatever data you like, even blog posts, sales from an eCommerce business or IoT data, it really depends on what you are using Strapi for.
Cretu is working as data processor and is interested in Python, Web Development, Data analysis.