Last updated: September 2, 2024 (Strapi v4 era)
23 min read

Build A Transcription App with Strapi, ChatGPT, & Whisper: Part 1

Mike Sullivan

August 28, 2024

Build A Transcription App with Strapi, ChatGPT, & Whisper_ Part 1.png

Introduction

Imagine having an AI assistant listen in on your video calls, write down everything being said, and then offer suggestions, answers, and analysis in real time. This series will show you how to build your very own transcription app that does just that.

You can find the outline for the upcoming series here.

Part 1: Implement Audio Recording and User Interface.
Part 2: Implement Strapi backend and connect API
Part 3: Implement connection to chatGPT and deploy to Strapi cloud

By the end of this tutorial series, you will be proficient in using Next.js, Strapi, ChatGPT, and Whisper to create interesting full-stack apps that incorporate cutting-edge AI technology to enhance functionality.

Below is a demo of what we will be building:

Demo of Build Transcription App with Strapi and ChatGPT

Prerequisites

To follow this tutorial, you will need the following:

An OpenAI (ChatGPT) account and API key.
Package manager (yarn or npm).
Node.js (v16 or v18).
Code editor (Visual Studio code, Sublime).
Basic knowledge of Javascript.
Basic knowledge of Strapi

Why is Transcription Important?

Transcription apps are helping people communicate more effectively. Imagine environments with lots of noise or where there may be language barriers and how AI-assisted audio conversion into text might help. Or imagine how this might help you participate more in conversations if you have a hearing impairment.

AI-powered transcription apps offer a unique advantage in understanding and analyzing meetings. By providing a comprehensive record of conversations and responses, these apps, like ours, leverage technologies such as ChatGPT to enhance clarity and provide valuable insights.

Technology Overview

Let's briefly discuss the technology and tools we will be using and their role in helping this app come together.

What is Strapi?

Strapi or Strapi CMS is an open-source headless content management system allowing us to quickly create APIs. We will use Strapi CMS to build our custom API, which will access ChatGPT, and we will use Strapi CMS to store data about the transcriptions, such as conversation history.

What is Next.js?

Next.js is a React framework that simplifies the development of complex and high-performance web applications. It offers many built-in features, such as server-side rendering, automatic code splitting, image optimization, and API routes.

What is Whisper?

Created by OpenAI, Whisper is a machine-learning transcription and speech recognition model. We will use the OpenAI API to connect with Whisper for speech recognition and transcription.

What is ChatGPT?

OpenAI also created ChatGPT, an AI chatbot that can respond to questions and produce various results, such as articles, essays, code, or emails. We will connect to ChatGPT to explain and analyze our transcribed text.

Strapi Cloud

Strapi Cloud is a cloud platform that makes it simple for developers to create, launch, and maintain online apps and services; this is where we will host the Strapi backend for the app (We will use Vercel or Netlify for the frontend). Visit Strapi Cloud to learn more about Strapi Cloud pricing and more.

How to Setup Next.js

Let's create our frontend directory. Navigate to the main folder, which we will call transcribe-tutorial, and enter the following command in the terminal.

npx create-next-app transcribe-frontend

Navigate to this newly created directory and run the following.

yarn dev

This should start and run the project in http://localhost:3000, when accessed through the web browser, we should be able to see the image below:

Let's do a bit of preliminary setup here; in transcribe-frontend, navigate to pages/index.js. This will be the entry point to our application; delete everything inside this file and paste the following:

1import Head from 'next/head';
2import styles from '../styles/Home.module.css';
3
4export default function Home() {
5  return (
6    <div className={styles.container}>
7      <Head>
8        <title>Strapi Transcribe</title>
9        <meta name="description" content="Generated by create next app" />
10        <link rel="icon" href="/favicon.ico" />
11      </Head>
12
13      <main className={styles.main}>
14        <h1 className={styles.title}>
15          Welcome to <a target="_blank" href="https://strapi.io">Strapi Transcribe!</a>
16        </h1>
17      </main>
18
19      <footer className={styles.footer}>Powered by Strapi</footer>
20    </div>
21  );
22}

Let's install some of the libraries we will use on the frontend. First, we need recordrtc, which will handle the recording process and give us access to raw data captured from the device's microphone. Next, we will use lamejs, which will help us process the data to encode it into MP3 format. We will also need axios to make network calls to OpenAI whisper, so run the following command in the terminal to install these libraries.

yarn add recordrtc @breezystack/lamejs axios

Set Up Environment Variable for OpenAI

Create a .env.local file in the root directory and then add the below environment variable with your OpenAI API key:

NEXT_PUBLIC_OPENAI_API_KEY="Paste your API key here"

Next.js Project structure

We will be using the container/presentational pattern to structure the application. This will separate the logic from the presentation (UI rendering). This makes the components easier to understand, helps reusability, and is more testable. The layout for our file structure is below: TranscribeContainer will host all of our state and logic. Then, we have the components directory, which will be presentational, and the utils directory to handle the recording and transcription.

1components/
2  RecordingControls.js
3  TranscriptionDisplay.js
4containers/
5  TranscribeContainer.js
6pages/
7  _app.js
8  index.js
9hooks/
10  useAudioRecorder.js
11utils/
12  transcriptionService.js

How to Capture Audio with `recordrtc`

First of all, let's learn how we can capture audio. We will use recordrtc, a Javascript library that uses the Web Real-Time Communication (WebRTC) API to capture media streams from the systems microphone. WebRTC provides an easy-to-use interface for handling recordings.

Create Custom Hooks for Audio Capture

Create a directory named hooks and then a file inside called useAudioRecorder.js. We will keep all of the logic and state for recording audio in this hook; if the application grows in size, this will allow us to use the logic elsewhere in the app without having to repeat ourselves.

Our custom hook will have five functions. Let's go over the code for each one:

1const handleStartRecording = async () => {
2    try {
3      setTranscribed('');
4
5      if (!stream.current) {
6        await onStartStreaming();
7      }
8      if (stream.current) {
9        if (!recorder.current) {
10          const {
11            default: { RecordRTCPromisesHandler, StereoAudioRecorder },
12          } = await import('recordrtc');
13          const recorderConfig = {
14            mimeType: 'audio/wav',
15            numberOfAudioChannels: 1,
16            recorderType: StereoAudioRecorder,
17            sampleRate: 44100,
18            timeSlice: streaming ? timeSlice : undefined,
19            type: 'audio',
20            ondataavailable: streaming ? onDataAvailable : undefined,
21          };
22          recorder.current = new RecordRTCPromisesHandler(
23            stream.current,
24            recorderConfig
25          );
26        }
27        if (!encoder.current) {
28          const { Mp3Encoder } = await import('@breezystack/lamejs');
29          encoder.current = new Mp3Encoder(1, 44100, 96);
30        }
31        const recordState = await recorder.current.getState();
32        if (recordState === 'inactive' || recordState === 'stopped') {
33          await recorder.current.startRecording();
34        }
35       
36        setRecording(true);
37      }
38    } catch (err) {
39      console.error(err);
40    }
41  };

handleStartRecording Hook: This function is asynchronous because we will be making network calls, which we will have to wait for. First, we set the last transcribed text to an empty string to make way for the newly transcribed data; then we check if there's a current audio stream; if there isn't, then we start it with the onStartStreaming function:

1const onStartStreaming = async () => {
2    try {
3      if (stream.current) {
4        stream.current.getTracks().forEach((track) => track.stop());
5      }
6      stream.current = await navigator.mediaDevices.getUserMedia({
7        audio: true,
8      });
9    } catch (err) {
10      console.error(err);
11    }
12  };

onStartStreaming Hook: This function checks if we have a current stream of audio from our speakers; if so, then it stops it; if not, then it uses the navigator.mediaDevices.getUserMedia method, which prompts the user for permission to use a media input that produces a MediaStream. We're requesting audio here, { audio: true}, then we save it to a stream Ref, which we will initialise at the start of the hook and pass this audio stream to recordrtc.

Next, we check that the stream started and that there's not already a recorder so we can start the initialization object and save it to our recorder Ref.

Set up Recorder Configuration

Now that we have the stream up and running and it has passed some initial checks, we dynamically import recordrtc (dynamically importing will save space and make our program run faster), then we destructure the RecordRTCPromiseHandler and StereoAudioRecorder, and set up the configuration object.

The most important parts of the configuration object are:

recorderType, where we pass in the StereoAudioRecorder, which is a class from recordrtc designed to record audio;
timeslice: which determines how often data is sent back to the application; it will call onDataAvailable in the interval we have specified with timeSlice;
onDataAvailable also checks if we are streaming; if so, it will call the function we passed to it, which is onDataAvailable; Once we have that configured, we assign a new instance of the RecordRTCPromiseHandler to the recorder Ref, passing in the audio stream Ref and the recorderConfig.

Set up Audio Encoder

So our recorder has been initialized, and now we set up our encoder. We dynamically import lamejs and assign our encoder to the encoder Ref.

Lastly, we check the recorder's state to see if it's inactive or stopped, and then we start recording.

1const onDataAvailable = async (data) => {
2    try {
3      if (streaming && recorder.current) {
4        if (encoder.current) {
5          const buffer = await data.arrayBuffer();
6          const mp3chunk = encoder.current.encodeBuffer(new Int16Array(buffer));
7          const mp3blob = new Blob([mp3chunk], { type: 'audio/mpeg' });
8          chunks.current.push(mp3blob);
9        }
10        const recorderState = await recorder.current.getState();
11        if (recorderState === 'recording') {
12          const blob = new Blob(chunks.current, { type: 'audio/mpeg' });
13          const file = new File([blob], 'speech.mp3', { type: 'audio/mpeg' });
14          const text = await transcriptionService(
15            file,
16            apiKey,
17            whisperApiEndpoint,
18            'transcriptions'
19          );
20          setTranscribed(text);
21        }
22      }
23    } catch (err) {
24      console.error(err);
25    }
26  };

So, as discussed, when we are recording, recordrtc will call onDataAvailable periodically with chunks of audio data.

onDataAvailable Hook: This checks if we are streaming audio and have a current recorder to avoid errors. This is where we will encode our audio to MP3. First, it checks if an encoder is available; if it is, it converts the received audio data into an array buffer, encodes this buffer into MP3 format, and then pushes it to our chunks Ref.

Next, it gets the recorder state to check if we are still recording, then concatenates the MP3 chunks into a single blob, which it packages into a File object. Now we have our audio file, which we send to whisper to transcribe with transcriptionService. This is just a util function, which I will explain later, and then we set the transcribed text in state to be displayed in the UI.

The other functions we have are handleStopRecording and onStopStreaming:

1 const handleStopRecording = async () => {
2    try {
3      if (recorder.current) {
4        const recordState = await recorder.current.getState();
5        if (recordState === 'recording' || recordState === 'paused') {
6          await recorder.current.stopRecording();
7        }
8
9        onStopStreaming();
10        setRecording(false);
11
12        await recorder.current.destroy();
13        chunks.current = [];
14        if (encoder.current) {
15          encoder.current.flush();
16          encoder.current = undefined;
17        }
18        recorder.current = undefined;
19      }
20    } catch (err) {
21      console.error(err);
22    }
23  };

handleStopRecording Hook: This gets the current state to make sure we are actually recording and then makes a call to stop it; it also calls the onStopStreaming function.
onStopStreaming Hook: This checks if we have a current audio stream and stop it if so:

1const onStopStreaming = () => {
2    if (stream.current) {
3      stream.current.getTracks().forEach((track) => track.stop());
4      stream.current = undefined;
5    }
6  };

Implementing The Audio Recording Feature

It is time to implement the recording feature of this app to allow users transcribe from their system's mic input.

Paste the entire code into your useAudioRecorder.js file:

1import { useState, useRef, useEffect } from 'react';
2import { transcriptionService } from '../utils/transcriptionService';
3
4export const useAudioRecorder = (
5  streaming,
6  timeSlice,
7  apiKey,
8  whisperApiEndpoint
9) => {
10  const chunks = useRef([]);
11  const encoder = useRef();
12  const recorder = useRef();
13  const stream = useRef();
14  const [recording, setRecording] = useState(false);
15  const [transcribed, setTranscribed] = useState('');
16
17  useEffect(() => {
18    return () => {
19      if (chunks.current) {
20        chunks.current = [];
21      }
22      if (encoder.current) {
23        encoder.current.flush();
24        encoder.current = undefined;
25      }
26      if (recorder.current) {
27        recorder.current.destroy();
28        recorder.current = undefined;
29      }
30
31      if (stream.current) {
32        stream.current.getTracks().forEach((track) => track.stop());
33        stream.current = undefined;
34      }
35    };
36  }, []);
37
38  const onStartStreaming = async () => {
39    try {
40      if (stream.current) {
41        stream.current.getTracks().forEach((track) => track.stop());
42      }
43      stream.current = await navigator.mediaDevices.getUserMedia({
44        audio: true,
45      });
46    } catch (err) {
47      console.error(err);
48    }
49  };
50
51  const onStopStreaming = () => {
52    if (stream.current) {
53      stream.current.getTracks().forEach((track) => track.stop());
54      stream.current = undefined;
55    }
56  };
57
58  const handleStartRecording = async () => {
59    try {
60      setTranscribed('');
61
62      if (!stream.current) {
63        await onStartStreaming();
64      }
65      if (stream.current) {
66        if (!recorder.current) {
67          const {
68            default: { RecordRTCPromisesHandler, StereoAudioRecorder },
69          } = await import('recordrtc');
70          const recorderConfig = {
71            mimeType: 'audio/wav',
72            numberOfAudioChannels: 1,
73            recorderType: StereoAudioRecorder,
74            sampleRate: 44100,
75            timeSlice: streaming ? timeSlice : undefined,
76            type: 'audio',
77            ondataavailable: streaming ? onDataAvailable : undefined,
78          };
79          recorder.current = new RecordRTCPromisesHandler(
80            stream.current,
81            recorderConfig
82          );
83        }
84        if (!encoder.current) {
85          const { Mp3Encoder } = await import('@breezystack/lamejs');
86          encoder.current = new Mp3Encoder(1, 44100, 96);
87        }
88        const recordState = await recorder.current.getState();
89        if (recordState === 'inactive' || recordState === 'stopped') {
90          await recorder.current.startRecording();
91        }
92
93        setRecording(true);
94      }
95    } catch (err) {
96      console.error(err);
97    }
98  };
99
100  const handleStopRecording = async () => {
101    try {
102      if (recorder.current) {
103        const recordState = await recorder.current.getState();
104        if (recordState === 'recording' || recordState === 'paused') {
105          await recorder.current.stopRecording();
106        }
107
108        onStopStreaming();
109        setRecording(false);
110
111        await recorder.current.destroy();
112        chunks.current = [];
113        if (encoder.current) {
114          encoder.current.flush();
115          encoder.current = undefined;
116        }
117        recorder.current = undefined;
118      }
119    } catch (err) {
120      console.error(err);
121    }
122  };
123
124  const onDataAvailable = async (data) => {
125    try {
126      if (streaming && recorder.current) {
127        if (encoder.current) {
128          const buffer = await data.arrayBuffer();
129          const mp3chunk = encoder.current.encodeBuffer(new Int16Array(buffer));
130          const mp3blob = new Blob([mp3chunk], { type: 'audio/mpeg' });
131          chunks.current.push(mp3blob);
132        }
133        const recorderState = await recorder.current.getState();
134        if (recorderState === 'recording') {
135          const blob = new Blob(chunks.current, { type: 'audio/mpeg' });
136          const file = new File([blob], 'speech.mp3', { type: 'audio/mpeg' });
137          const text = await transcriptionService(
138            file,
139            apiKey,
140            whisperApiEndpoint,
141            'transcriptions'
142          );
143          setTranscribed(text);
144        }
145      }
146    } catch (err) {
147      console.error(err);
148    }
149  };
150
151  return {
152    recording,
153    transcribed,
154    handleStartRecording,
155    handleStopRecording,
156    setTranscribed,
157  };
158};

In the code above, you may notice that we have a useEffect cleanup hook. This is just to ensure that any allocated resources are cleaned up when the component using this hook unmounts.

Make API Calls to Whisper using Axios

The transcriptionService of our hook will call the Whisper API using Axios. We append our audio file to the body, which is created using the built-in Javascript function FormData().

Create a util directory in the root of the application and then create a file named transcriptionService.js and paste in the following code:

1import axios from 'axios';
2
3export const transcriptionService = async (
4  file,
5  apiKey,
6  whisperApiEndpoint,
7  mode
8) => {
9  const body = new FormData();
10  body.append('file', file);
11  body.append('model', 'whisper-1');
12  body.append('language', 'en');
13
14  const headers = {};
15  headers['Content-Type'] = 'multipart/form-data';
16
17  if (apiKey) {
18    headers['Authorization'] = `Bearer ${apiKey}`;
19  }
20
21  const response = await axios.post(`${whisperApiEndpoint}${mode}`, body, {
22    headers,
23  });
24
25  return response.data.text;
26};

That's all the code we need to transcribe from our system's mic input.

Building out the Next.js UI

Let's look at building the UI so we can reason visually about where to connect the API later. We will need to create a dashboard that shows our saved meetings and allows us to start new ones, and then a view to show the transcriptions; let's finish off the transcription view and then build the dashboard.

Create Custom CSS Code

First, delete everything in the globals.css file in the styles directory and replace it with the following core styles:

1html,
2body {
3  padding: 0;
4  margin: 0;
5  font-family: -apple-system, BlinkMacSystemFont, Segoe UI, Roboto, Oxygen,
6    Ubuntu, Cantarell, Fira Sans, Droid Sans, Helvetica Neue, sans-serif;
7}
8
9:root {
10  --primary: #4945ff;
11  --primaryLight: #7572ff;
12  --secondary: #8c4bff;
13  --secondaryLight: #a47fff;
14  --headerColor: #1a1a1a;
15  --bodyTextColor: #4e4b66;
16  --bodyTextColorWhite: #fafbfc;
17  /* 13px - 16px */
18  --topperFontSize: clamp(0.8125rem, 1.6vw, 1rem);
19  /* 31px - 49px */
20  --headerFontSize: clamp(1.9375rem, 3.9vw, 3.0625rem);
21  --bodyFontSize: 1rem;
22  /* 60px - 100px top and bottom */
23  --sectionPadding: clamp(3.75rem, 7.82vw, 6.25rem) 1rem;
24}
25
26*,
27*:before,
28*:after {
29  /* prevents padding from affecting height and width */
30  box-sizing: border-box;
31}

Create Component for Transcription Display

Create the container directory in the application's root and then create a file named TranscribeContainer.js. This is where we can use our recording hook to capture and display the transcriptions. Paste the following code into the newly created file:

1import React, { useState } from 'react';
2import styles from '../styles/Transcribe.module.css';
3import { useAudioRecorder } from '../hooks/useAudioRecorder';
4import RecordingControls from '../components/transcription/RecordingControls';
5import TranscribedText from '../components/transcription/TranscribedText';
6
7const mockAnswer =
8  'Example answer to transcription here: Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!';
9const mockAnalysis =
10  'Example analysis to transcription here: Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!';
11
12const TranscribeContainer = ({ streaming = true, timeSlice = 1000 }) => {
13  const [analysis, setAnalysis] = useState('');
14  const [answer, setAnswer] = useState('');
15  const apiKey = process.env.NEXT_PUBLIC_OPENAI_API_KEY;
16  const whisperApiEndpoint = 'https://api.openai.com/v1/audio/';
17  const { recording, transcribed, handleStartRecording, handleStopRecording, setTranscribed } =
18    useAudioRecorder(streaming, timeSlice, apiKey, whisperApiEndpoint);
19
20  const handleGetAnalysis = () => {
21    setAnalysis(mockAnalysis);
22  };
23
24  const handleGetAnswer = () => {
25    setAnswer(mockAnswer);
26  };
27
28  const handleStopMeeting = () => {};
29
30  return (
31    <div style={{ margin: '20px' }}>
32      <button
33        className={styles['end-meeting-button']}
34        onClick={handleStopMeeting}
35      >
36        End Meeting
37      </button>
38      <input
39        type="text"
40        placeholder="Meeting title here..."
41        className={styles['custom-input']}
42      />
43      <div>
44        <RecordingControls
45          handleStartRecording={handleStartRecording}
46          handleStopRecording={handleStopRecording}
47        />
48        {recording ? (
49          <p className={styles['primary-text']}>Recording</p>
50        ) : (
51          <p>Not recording</p>
52        )}
53        <TranscribedText
54          transcribed={transcribed}
55          answer={answer}
56          analysis={analysis}
57          handleGetAnalysis={handleGetAnalysis}
58          handleGetAnswer={handleGetAnswer}
59        />
60      </div>
61    </div>
62  );
63};
64
65export default TranscribeContainer;

Here, we import the useAudioRecorder hook, initialize it with the required variables, and structure the values we need from it. We also have an end meeting button and an input where users can name their meeting.

There are some display components: RecordingControls, which will just be a component to keep our control buttons, and TranscribeText, which will be used to display our transcriptions and any analysis we get from chatGPT. As you can see from the code above, we are passing the text props to it and a couple of functions, which will just be mocked for now.

Create Other Display Components

Create Recording Controls Component Let's create these components! Create a components directory, and inside that, create a transcription directory. Create a file named RecordingControls.js and paste the following code:

1import styles from '../../styles/Transcribe.module.css';
2
3function RecordingControls({ handleStartRecording, handleStopRecording }) {
4  return (
5    <div className={styles['control-container']}>
6      <button
7        className={styles['primary-button']}
8        onClick={handleStartRecording}
9      >
10        Start Recording
11      </button>
12      <button
13        className={styles['secondary-button']}
14        onClick={handleStopRecording}
15      >
16        Stop Recording
17      </button>
18    </div>
19  );
20}
21
22export default RecordingControls;

This is just a simple flex container with a couple of buttons.

Create Component for Transcribed Text Now create a file named TranscribedText.js and paste the following code inside:

1import styles from '../../styles/Transcribe.module.css';
2
3function TranscribedText({
4  transcribed,
5  answer,
6  analysis,
7  handleGetAnalysis,
8  handleGetAnswer,
9}) {
10  return (
11    <div className={styles['transcribed-text-container']}>
12      <div className={styles['speech-bubble-container']}>
13        {transcribed && (
14          <div className={styles['speech-bubble']}>
15            <div className={styles['speech-pointer']}></div>
16            <div className={styles['speech-text-question']}>{transcribed}</div>
17            <div className={styles['button-container']}>
18              <button
19                className={styles['primary-button-analysis']}
20                onClick={handleGetAnalysis}
21              >
22                Get analysis
23              </button>
24              <button
25                className={styles['primary-button-answer']}
26                onClick={handleGetAnswer}
27              >
28                Get answer
29              </button>
30            </div>
31          </div>
32        )}
33      </div>
34      <div>
35        <div className={styles['speech-bubble-container']}>
36          {analysis && (
37            <div className={styles['analysis-bubble']}>
38              <div className={styles['analysis-pointer']}></div>
39              <p style={{ margin: 0 }}>Analysis</p>
40              <div className={styles['speech-text-answer']}>{analysis}</div>
41            </div>
42          )}
43        </div>
44        <div className={styles['speech-bubble-container']}>
45          {answer && (
46            <div className={styles['speech-bubble-right']}>
47              <div className={styles['speech-pointer-right']}></div>
48              <p style={{ margin: 0 }}>Answer</p>
49              <div className={styles['speech-text-answer']}>{answer}</div>
50            </div>
51          )}
52        </div>
53      </div>
54    </div>
55  );
56}
57
58export default TranscribedText;

This is just to display each transcribed chunk of text with its corresponding information.

We need to create the CSS module files so our components display correctly. In the styles directory, create a file named Transcribe.modules.css and paste in the following CSS code:

1.control-container {
2  margin: 0 auto;
3  width: 380px;
4}
5
6.button-container {
7  display: flex;
8  justify-content: flex-end;
9  margin: 10px;
10}
11
12.primary-text {
13  color: var(--primaryLight);
14}
15
16.primary-button {
17  background-color: var(--primary);
18  color: white;
19  border: none;
20  border-radius: 5px;
21  padding: 10px 20px;
22  font-size: 16px;
23  cursor: pointer;
24  margin: 10px;
25}
26
27.primary-button:hover {
28  background-color: var(--primaryLight);
29}
30
31.primary-button-analysis {
32  background-color: var(--secondaryLight);
33  color: black;
34  border: none;
35  border-radius: 5px;
36  padding: 10px 20px;
37  font-size: 16px;
38  cursor: pointer;
39  margin: 10px;
40}
41
42.primary-button-answer {
43  background-color: #c8e6c9;
44  color: black;
45  border: none;
46  border-radius: 5px;
47  padding: 10px 20px;
48  font-size: 16px;
49  cursor: pointer;
50  margin: 10px;
51}
52
53.primary-button-answer:hover {
54  background-color: var(--primaryLight);
55}
56.primary-button-analysis:hover {
57  background-color: var(--primaryLight);
58}
59
60.secondary-button {
61  background-color: #d3d3d3;
62  color: black;
63  border: none;
64  border-radius: 5px;
65  padding: 10px 20px;
66  font-size: 16px;
67  cursor: pointer;
68}
69
70.secondary-button:hover {
71  background-color: #b0b0b0;
72}
73
74.end-meeting-button {
75  background-color: red;
76  color: white;
77  border: none;
78  border-radius: 5px;
79  padding: 10px 20px;
80  font-size: 16px;
81  cursor: pointer;
82}
83
84.end-meeting-button {
85  position: absolute;
86  top: 0;
87  right: 0;
88  padding: 10px 20px;
89  background-color: red;
90  color: white;
91  border: none;
92  border-radius: 5px;
93  cursor: pointer;
94  margin: 20px;
95}
96
97.end-meeting-button:hover {
98  background-color: darkred;
99}
100
101.transcribed-text-container {
102  position: relative;
103  display: flex;
104  flex-direction: row;
105  align-items: center;
106  justify-content: space-between;
107}
108
109.speech-bubble-container {
110  width: 80%;
111  margin: 20px;
112}
113
114.speech-bubble {
115  position: relative;
116  background-color: var(--primaryLight);
117  border: 2px solid var(--primaryLight);
118  border-radius: 8px;
119  padding: 10px;
120}
121
122.speech-pointer {
123  position: absolute;
124  top: 0;
125  left: -19px;
126  width: 0;
127  height: 0;
128  border-style: solid;
129  border-width: 0 0 20px 20px;
130  border-color: transparent transparent var(--primaryLight) transparent;
131}
132
133.speech-text-question {
134  margin: 0;
135  font-size: 16px;
136  line-height: 16px;
137  letter-spacing: 1.4px;
138  font-family: 'Gill Sans', 'Gill Sans MT', Calibri, 'Trebuchet MS', sans-serif;
139  color: white;
140}
141
142.speech-bubble-right {
143  position: relative;
144  background-color: #c8e6c9;
145  border: 2px solid #c8e6c9;
146  border-radius: 8px;
147  padding: 10px;
148}
149
150.speech-pointer-right {
151  position: absolute;
152  top: -2px;
153  right: -17px;
154  width: 0;
155  height: 0;
156  border-style: solid;
157  border-width: 0 0 20px 20px;
158  border-color: transparent transparent transparent #c8e6c9;
159}
160
161.speech-text-answer {
162  margin: 0;
163  font-size: 14px;
164  line-height: 21px;
165  letter-spacing: 1.8px;
166  font-family: 'Gill Sans', 'Gill Sans MT', Calibri, 'Trebuchet MS', sans-serif;
167  color: black;
168}
169
170.analysis-bubble {
171  position: relative;
172  background-color: var(--secondaryLight);
173  border: 2px solid var(--secondaryLight);
174  border-radius: 8px;
175  padding: 10px;
176}
177
178.analysis-pointer {
179  position: absolute;
180  top: -2px;
181  right: -17px;
182  width: 0;
183  height: 0;
184  border-style: solid;
185  border-width: 0 0 20px 20px;
186  border-color: transparent transparent transparent var(--secondaryLight);
187}
188
189.transcribed-text-container {
190  position: relative;
191  display: flex;
192  flex-direction: row;
193  align-items: center;
194  justify-content: space-between;
195}
196
197.custom-input {
198  border: none;
199  border-bottom: 2px solid #000;
200  padding: 5px 0;
201  width: 100%;
202  box-sizing: border-box;
203  margin: 20px;
204  line-height: 1.15;
205  font-size: 4rem;
206}
207
208.custom-input:focus {
209  outline: none;
210  border-bottom: 2px solid var(--primary);
211  margin: 20px;
212}
213
214.title {
215  margin: 20px;
216  line-height: 1.15;
217  font-size: 4rem;
218}
219
220.goBackButton {
221  margin-right: 10px;
222  padding: 5px 10px;
223  background-color: #0070f3;
224  color: white;
225  border: none;
226  border-radius: 5px;
227  cursor: pointer;
228}
229
230.goBackButton:hover {
231  background-color: #005bb5;
232}
233
234@media (max-width: 700px) {
235  .transcribed-text-container {
236    flex-direction: column;
237    align-items: flex-start;
238  }
239
240  .button-container {
241    width: 100%;
242  }
243
244  .primary-button {
245    width: 100%;
246    margin: 5px 0;
247  }
248}

Create Transcription Page Because our TranscribeContainer will be accessed from the meeting dashboard, we must use the Next.js in-built router. To do that, we can just create a file in the pages directory, so go ahead and create transcription.js in the pages directory and paste the following code in:

1import React from 'react';
2import styles from '../styles/Home.module.css';
3import TranscribeContainer from '../containers/TranscribeContainer';
4
5const Transcription = () => {
6
7  return (
8    <div className={styles.container}>
9      <main className={styles.main}>
10        <TranscribeContainer />
11      </main>
12    </div>
13  );
14};
15
16export default Transcription;

Please add the following styles to the Home.module.css file:

1.header {
2  display: flex;
3  align-items: center;
4  margin-top: 20px;
5}
6
7.goBackButton {
8  margin-right: 10px;
9  padding: 5px 10px;
10  background-color: #0070f3;
11  color: white;
12  border: none;
13  border-radius: 5px;
14  cursor: pointer;
15}
16
17.goBackButton:hover {
18  background-color: #005bb5;
19}

Create Meeting Dashboard Container Now our transcription page is all set; we just need a way to access it, so let's start by creating a MeetingDashboardContainer.js file in the containers directory:

1import React from 'react';
2import styles from '../styles/Meeting.module.css';
3import MeetingCard from '../components/meeting/MeetingCard';
4import Link from 'next/link';
5
6const meeting = [
7  {
8    overview:
9      'Overview of the meeting here Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!',
10    title: 'Example title 1',
11  },
12  {
13    overview:
14      'Overview of the meeting here Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!',
15    title: 'Example title 2',
16  },
17  {
18    overview:
19      'Overview of the meeting here Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!',
20    title: 'Example title 3',
21  },
22];
23
24const MeetingDashboardContainer = () => {
25  return (
26    <div id={styles['meeting-container']}>
27      <div class={styles['cs-container']}>
28        <div class={styles['cs-content']}>
29          <div class={styles['cs-content-flex']}>
30            <span class={styles['cs-topper']}>Meeting dashboard</span>
31            <h2 class={styles['cs-title']}>Start a new meeting!</h2>
32          </div>
33          <Link href="/transcription" class={styles['cs-button-solid']}>
34            New meeting
35          </Link>
36        </div>
37        <ul class={styles['cs-card-group']}>
38          {meeting.map((val, i) => {
39            return (
40              <MeetingCard key={i} title={val.title} overview={overview.split(' ').slice(0, 30).join(' ') + '...'} />
41            );
42          })}
43        </ul>
44      </div>
45    </div>
46  );
47};
48
49export default MeetingDashboardContainer;

This is where the user will first land in our application; it's just a page to welcome the user, show a history of saved meetings, and allow them to start a new one.

For now, we are mocking the data, which we will later get from our API with the const called meeting. We are mapping out the contents of that and displaying it with a component called MeetingCard. Notice we are truncating the overview parameter passed to MeetingCard as this will likely be a long paragraph, and we only want to display a preview in the card. Let's create the MeetingCard component now.

Create Meeting Card Component Create a directory called meeting in the components directory and a file called MeetingCard.js with the following:

1import styles from '../../styles/Meeting.module.css';
2
3const MeetingCard = ({ title, overview }) => {
4  return (
5    <li class={styles['cs-item']}>
6      <div class={styles['cs-flex']}>
7        <h3 class={styles['cs-h3']}>{title}</h3>
8        <p class={styles['cs-item-text']}>{overview}</p>
9        <a href="" class={styles['cs-link']}>
10          Open meeting
11          <img
12            class={styles['cs-arrow']}
13            loading="lazy"
14            decoding="async"
15            src="https://csimg.nyc3.cdn.digitaloceanspaces.com/Icons/event-chevron.svg"
16            alt="icon"
17            width="20"
18            height="20"
19            aria-hidden="true"
20          />
21        </a>
22      </div>
23    </li>
24  );
25};
26
27export default MeetingCard;

Now, let's create the styles for the Meeting dashboard. Create a file called Meeting.module.css in the styles directory with the following CSS:

1@media only screen and (min-width: 0rem) {
2  #meeting-container {
3    padding: var(--sectionPadding);
4    position: relative;
5    z-index: 1;
6    min-height: 100vh;
7  }
8  #meeting-container .cs-container {
9    width: 100%;
10    max-width: 49rem;
11    margin: auto;
12    display: flex;
13    flex-direction: column;
14    align-items: center;
15    gap: clamp(3rem, 6vw, 4rem);
16    min-height: 100vh;
17  }
18  #meeting-container .cs-content {
19    text-align: left;
20    width: 100%;
21    display: flex;
22    flex-direction: column;
23    align-items: flex-start;
24  }
25
26  #meeting-container .cs-title {
27    max-width: 20ch;
28  }
29  #meeting-container .cs-button-solid {
30    font-size: 1rem;
31    line-height: clamp(2.875rem, 5.5vw, 3.5rem);
32    text-decoration: none;
33    font-weight: 700;
34    text-align: center;
35    margin: 0;
36    color: white;
37    min-width: 12.5rem;
38    padding: 0 1.5rem;
39    background-color: var(--secondary);
40    border-radius: 0.5rem;
41    display: inline-block;
42    position: relative;
43    z-index: 1;
44    box-sizing: border-box;
45    transition: color 0.3s;
46    cursor: pointer;
47  }
48  #meeting-container .cs-button-solid:before {
49    content: '';
50    position: absolute;
51    height: 100%;
52    width: 0%;
53    background: #000;
54    opacity: 1;
55    top: 0;
56    left: 0;
57    z-index: -1;
58    border-radius: 0.5rem;
59    transition: width 0.3s;
60  }
61  #meeting-container .cs-button-solid:hover {
62    color: #fff;
63  }
64  #meeting-container .cs-button-solid:hover:before {
65    width: 100%;
66  }
67  #meeting-container .cs-card-group {
68    width: 100%;
69    padding: 0;
70    margin: 0;
71    display: grid;
72    grid-template-columns: repeat(12, 1fr);
73    gap: 1.25rem;
74  }
75  #meeting-container .cs-item {
76    text-align: left;
77    list-style: none;
78    border-radius: 1rem;
79    overflow: hidden;
80    background-color: #f7f7f7;
81    border: 1px solid #e8e8e8;
82    grid-column: span 12;
83    display: flex;
84    flex-direction: column;
85    justify-content: space-between;
86    position: relative;
87    z-index: 1;
88    transition: box-shadow 0.3s, transform 0.3s;
89  }
90  #meeting-container .cs-item:hover {
91    box-shadow: rgba(149, 157, 165, 0.2) 0px 8px 24px;
92  }
93  #meeting-container .cs-item:hover .cs-picture img {
94    opacity: 0.3;
95    transform: scale(1.1);
96  }
97  #meeting-container .cs-flex {
98    height: 100%;
99    padding: 1.5rem;
100    /* prevents padding and border from affecting height and width */
101    box-sizing: border-box;
102    display: flex;
103    flex-direction: column;
104    align-items: flex-start;
105    position: relative;
106    z-index: 2;
107  }
108  #meeting-container .cs-h3 {
109    font-size: 1.25rem;
110    text-align: inherit;
111    line-height: 1.2em;
112    font-weight: 700;
113    color: var(--headerColor);
114    margin: 0 0 0.75rem 0;
115    transition: color 0.3s;
116  }
117  #meeting-container .cs-item-text {
118    /* 14px - 16px */
119    font-size: clamp(0.875rem, 1.5vw, 1rem);
120    line-height: 1.5em;
121    text-align: inherit;
122    margin: 0 0 1.25rem;
123    color: var(--bodyTextColor);
124  }
125  #meeting-container .cs-link {
126    font-size: 1rem;
127    line-height: 1.2em;
128    font-weight: 700;
129    text-decoration: none;
130    margin-top: auto;
131    color: var(--primary);
132    display: flex;
133    align-items: center;
134    justify-content: center;
135    cursor: pointer;
136  }
137  #meeting-container .cs-link:hover .cs-arrow {
138    transform: translateX(0.25rem);
139  }
140  #meeting-container .cs-arrow {
141    width: 1.25rem;
142    height: auto;
143    transition: transform 0.3s;
144  }
145}
146/* Tablet - 768px */
147@media only screen and (min-width: 48rem) {
148  #meeting-container .cs-container {
149    max-width: 80rem;
150  }
151  #meeting-container .cs-content {
152    text-align: left;
153    flex-direction: row;
154    justify-content: space-between;
155    align-items: flex-end;
156  }
157  #meeting-container .cs-title {
158    margin: 0;
159  }
160  #meeting-container .cs-item {
161    grid-column: span 4;
162  }
163}
164
165.cs-topper {
166  font-size: var(--topperFontSize);
167  line-height: 1.2em;
168  text-transform: uppercase;
169  text-align: inherit;
170  letter-spacing: 0.1em;
171  font-weight: 700;
172  color: var(--primary);
173  margin-bottom: 0.25rem;
174  display: block;
175}
176
177.cs-title {
178  font-size: var(--headerFontSize);
179  font-weight: 900;
180  line-height: 1.2em;
181  text-align: inherit;
182  max-width: 43.75rem;
183  margin: 0 0 1rem 0;
184  color: var(--headerColor);
185  position: relative;
186}
187
188.cs-text {
189  font-size: var(--bodyFontSize);
190  line-height: 1.5em;
191  text-align: inherit;
192  width: 100%;
193  max-width: 40.625rem;
194  margin: 0;
195  color: var(--bodyTextColor);
196}

Render the UI

Lastly, import MeetingDashboardContainer into index.js:

1import Head from 'next/head';
2import styles from '../styles/Home.module.css';
3import TranscribeContainer from '../containers/TranscribeContainer';
4
5export default function Home() {
6  return (
7    <div className={styles.container}>
8      <Head>
9        <title>Strapi Transcribe</title>
10        <meta name="description" content="Generated by create next app" />
11        <link rel="icon" href="/favicon.ico" />
12      </Head>
13
14      <main className={styles.main}>
15        <h1 className={styles.title}>
16          Welcome to{' '}
17          <a target="_blank" href="https://strapi.io">
18            Strapi Transcribe!
19          </a>
20        </h1>
21        <MeetingDashboardContainer />
22      </main>
23
24      <footer className={styles.footer}>Powered by Strapi</footer>
25    </div>
26  );
27}

Now that we have our dashboard UI and transcription view set up, we can test the code.

Open up your terminal, navigate to the frontend, and run the below command:

yarn dev

Now navigate to http://localhost:3000 in your browser, and you should be able to see the following interface:

UI of Transcription App with Strapi and ChatGPT.png

To start transcribing, first, click on "New meeting" and then just click "Start recording." Then, talk into your computer's microphone (be aware that this will cost you OpenAI credits, so don't leave it running for too long). You can click stop recording to stop the transcription.

Recording Demo!

Test your app by clicking on the "New Meeting" button as shown in the GIF below:

Recording GIF.gif

For a more real-world use case, you can open your desktop meeting app (Slack or Teams) and then send yourself a meeting invite and join from your mobile phone. If you hit record, you can then speak through your phone from another room (to avoid feedback). You will see that the app picks up and transcribes what you're saying through the laptop's speakers using the mic, successfully simulating a transcription of a virtual meeting.

Stay tuned for part two

In part two of this series, we will set up our backend with Strapi. Stay tuned to see how we will structure our data to save meetings and transcriptions programmatically with the API and how we will link this to our Next.js app.

Additional Resources

Github link to the complete code.