Imagine having an AI assistant listen in on your video calls, write down everything being said, and then offer suggestions, answers, and analysis in real time. This series will show you how to build your very own transcription app that does just that.
You can find the outline for the upcoming series here.
By the end of this tutorial series, you will be proficient in using Next.js, Strapi, ChatGPT, and Whisper to create interesting full-stack apps that incorporate cutting-edge AI technology to enhance functionality.
Below is a demo of what we will be building:
To follow this tutorial, you will need the following:
Transcription apps are helping people communicate more effectively. Imagine environments with lots of noise or where there may be language barriers and how AI-assisted audio conversion into text might help. Or imagine how this might help you participate more in conversations if you have a hearing impairment.
AI-powered transcription apps offer a unique advantage in understanding and analyzing meetings. By providing a comprehensive record of conversations and responses, these apps, like ours, leverage technologies such as ChatGPT to enhance clarity and provide valuable insights.
Let's briefly discuss the technology and tools we will be using and their role in helping this app come together.
Strapi or Strapi CMS is an open-source headless content management system allowing us to quickly create APIs. We will use Strapi CMS to build our custom API, which will access ChatGPT, and we will use Strapi CMS to store data about the transcriptions, such as conversation history.
Next.js is a React framework that simplifies the development of complex and high-performance web applications. It offers many built-in features, such as server-side rendering, automatic code splitting, image optimization, and API routes.
Created by OpenAI, Whisper is a machine-learning transcription and speech recognition model. We will use the OpenAI API to connect with Whisper for speech recognition and transcription.
OpenAI also created ChatGPT, an AI chatbot that can respond to questions and produce various results, such as articles, essays, code, or emails. We will connect to ChatGPT
to explain and analyze our transcribed text.
Strapi Cloud is a cloud platform that makes it simple for developers to create, launch, and maintain online apps and services; this is where we will host the Strapi backend for the app (We will use Vercel or Netlify for the frontend). Visit Strapi Cloud to learn more about Strapi Cloud pricing and more.
Let's create our frontend directory. Navigate to the main folder, which we will call transcribe-tutorial
, and enter the following command in the terminal.
npx create-next-app transcribe-frontend
Navigate to this newly created directory and run the following.
yarn dev
This should start and run the project in http://localhost:3000
, when accessed through the web browser, we should be able to see the image below:
Let's do a bit of preliminary setup here; in transcribe-frontend
, navigate to pages/index.js
. This will be the entry point to our application; delete everything inside this file and paste the following:
1import Head from 'next/head';
2import styles from '../styles/Home.module.css';
3
4export default function Home() {
5 return (
6 <div className={styles.container}>
7 <Head>
8 <title>Strapi Transcribe</title>
9 <meta name="description" content="Generated by create next app" />
10 <link rel="icon" href="/favicon.ico" />
11 </Head>
12
13 <main className={styles.main}>
14 <h1 className={styles.title}>
15 Welcome to <a target="_blank" href="https://strapi.io">Strapi Transcribe!</a>
16 </h1>
17 </main>
18
19 <footer className={styles.footer}>Powered by Strapi</footer>
20 </div>
21 );
22}
Let's install some of the libraries we will use on the frontend. First, we need recordrtc
, which will handle the recording process and give us access to raw data captured from the device's microphone. Next, we will use lamejs
, which will help us process the data to encode it into MP3 format. We will also need axios
to make network calls to OpenAI whisper, so run the following command in the terminal to install these libraries.
yarn add recordrtc @breezystack/lamejs axios
Create a .env.local
file in the root directory and then add the below environment variable with your OpenAI API key:
NEXT_PUBLIC_OPENAI_API_KEY="Paste your API key here"
We will be using the container/presentational pattern to structure the application. This will separate the logic from the presentation (UI rendering). This makes the components easier to understand, helps reusability, and is more testable. The layout for our file structure is below: TranscribeContainer
will host all of our state and logic. Then, we have the components
directory, which will be presentational, and the utils
directory to handle the recording and transcription.
1components/
2 RecordingControls.js
3 TranscriptionDisplay.js
4containers/
5 TranscribeContainer.js
6pages/
7 _app.js
8 index.js
9hooks/
10 useAudioRecorder.js
11utils/
12 transcriptionService.js
recordrtc
First of all, let's learn how we can capture audio. We will use recordrtc
, a Javascript library that uses the Web Real-Time Communication (WebRTC) API to capture media streams from the systems microphone. WebRTC provides an easy-to-use interface for handling recordings.
Create a directory named hooks
and then a file inside called useAudioRecorder.js
. We will keep all of the logic and state for recording audio in this hook; if the application grows in size, this will allow us to use the logic elsewhere in the app without having to repeat ourselves.
Our custom hook will have five functions. Let's go over the code for each one:
1const handleStartRecording = async () => {
2 try {
3 setTranscribed('');
4
5 if (!stream.current) {
6 await onStartStreaming();
7 }
8 if (stream.current) {
9 if (!recorder.current) {
10 const {
11 default: { RecordRTCPromisesHandler, StereoAudioRecorder },
12 } = await import('recordrtc');
13 const recorderConfig = {
14 mimeType: 'audio/wav',
15 numberOfAudioChannels: 1,
16 recorderType: StereoAudioRecorder,
17 sampleRate: 44100,
18 timeSlice: streaming ? timeSlice : undefined,
19 type: 'audio',
20 ondataavailable: streaming ? onDataAvailable : undefined,
21 };
22 recorder.current = new RecordRTCPromisesHandler(
23 stream.current,
24 recorderConfig
25 );
26 }
27 if (!encoder.current) {
28 const { Mp3Encoder } = await import('@breezystack/lamejs');
29 encoder.current = new Mp3Encoder(1, 44100, 96);
30 }
31 const recordState = await recorder.current.getState();
32 if (recordState === 'inactive' || recordState === 'stopped') {
33 await recorder.current.startRecording();
34 }
35
36 setRecording(true);
37 }
38 } catch (err) {
39 console.error(err);
40 }
41 };
handleStartRecording
Hook: This function is asynchronous because we will be making network calls, which we will have to wait for. First, we set the last transcribed text to an empty string to make way for the newly transcribed data; then we check if there's a current audio stream; if there isn't, then we start it with the onStartStreaming
function:1const onStartStreaming = async () => {
2 try {
3 if (stream.current) {
4 stream.current.getTracks().forEach((track) => track.stop());
5 }
6 stream.current = await navigator.mediaDevices.getUserMedia({
7 audio: true,
8 });
9 } catch (err) {
10 console.error(err);
11 }
12 };
onStartStreaming
Hook: This function checks if we have a current stream of audio from our speakers; if so, then it stops it; if not, then it uses the navigator.mediaDevices.getUserMedia
method, which prompts the user for permission to use a media input that produces a MediaStream. We're requesting audio here, { audio: true}
, then we save it to a stream Ref
, which we will initialise at the start of the hook and pass this audio stream to recordrtc
.Next, we check that the stream started and that there's not already a recorder so we can start the initialization object and save it to our recorder Ref
.
Now that we have the stream up and running and it has passed some initial checks, we dynamically import recordrtc
(dynamically importing will save space and make our program run faster), then we destructure the RecordRTCPromiseHandler
and StereoAudioRecorder
, and set up the configuration object.
The most important parts of the configuration object are:
recorderType
, where we pass in the StereoAudioRecorder
, which is a class from recordrtc
designed to record audio; timeslice
: which determines how often data is sent back to the application; it will call onDataAvailable
in the interval we have specified with timeSlice
; onDataAvailable
also checks if we are streaming; if so, it will call the function we passed to it, which is onDataAvailable
; Once we have that configured, we assign a new instance of the RecordRTCPromiseHandler
to the recorder Ref
, passing in the audio stream Ref
and the recorderConfig
.So our recorder has been initialized, and now we set up our encoder. We dynamically import lamejs
and assign our encoder to the encoder Ref
.
Lastly, we check the recorder's state to see if it's inactive or stopped, and then we start recording.
1const onDataAvailable = async (data) => {
2 try {
3 if (streaming && recorder.current) {
4 if (encoder.current) {
5 const buffer = await data.arrayBuffer();
6 const mp3chunk = encoder.current.encodeBuffer(new Int16Array(buffer));
7 const mp3blob = new Blob([mp3chunk], { type: 'audio/mpeg' });
8 chunks.current.push(mp3blob);
9 }
10 const recorderState = await recorder.current.getState();
11 if (recorderState === 'recording') {
12 const blob = new Blob(chunks.current, { type: 'audio/mpeg' });
13 const file = new File([blob], 'speech.mp3', { type: 'audio/mpeg' });
14 const text = await transcriptionService(
15 file,
16 apiKey,
17 whisperApiEndpoint,
18 'transcriptions'
19 );
20 setTranscribed(text);
21 }
22 }
23 } catch (err) {
24 console.error(err);
25 }
26 };
So, as discussed, when we are recording, recordrtc
will call onDataAvailable
periodically with chunks of audio data.
onDataAvailable
Hook: This checks if we are streaming audio and have a current recorder to avoid errors. This is where we will encode our audio to MP3. First, it checks if an encoder is available; if it is, it converts the received audio data into an array buffer, encodes this buffer into MP3 format, and then pushes it to our chunks Ref
. Next, it gets the recorder state to check if we are still recording, then concatenates the MP3 chunks into a single blob, which it packages into a File object. Now we have our audio file, which we send to whisper to transcribe with transcriptionService
. This is just a util function, which I will explain later, and then we set the transcribed text in state to be displayed in the UI.
The other functions we have are handleStopRecording
and onStopStreaming
:
1 const handleStopRecording = async () => {
2 try {
3 if (recorder.current) {
4 const recordState = await recorder.current.getState();
5 if (recordState === 'recording' || recordState === 'paused') {
6 await recorder.current.stopRecording();
7 }
8
9 onStopStreaming();
10 setRecording(false);
11
12 await recorder.current.destroy();
13 chunks.current = [];
14 if (encoder.current) {
15 encoder.current.flush();
16 encoder.current = undefined;
17 }
18 recorder.current = undefined;
19 }
20 } catch (err) {
21 console.error(err);
22 }
23 };
handleStopRecording
Hook: This gets the current state to make sure we are actually recording and then makes a call to stop it; it also calls the onStopStreaming
function.onStopStreaming
Hook: This checks if we have a current audio stream and stop it if so:1const onStopStreaming = () => {
2 if (stream.current) {
3 stream.current.getTracks().forEach((track) => track.stop());
4 stream.current = undefined;
5 }
6 };
It is time to implement the recording feature of this app to allow users transcribe from their system's mic input.
Paste the entire code into your useAudioRecorder.js
file:
1import { useState, useRef, useEffect } from 'react';
2import { transcriptionService } from '../utils/transcriptionService';
3
4export const useAudioRecorder = (
5 streaming,
6 timeSlice,
7 apiKey,
8 whisperApiEndpoint
9) => {
10 const chunks = useRef([]);
11 const encoder = useRef();
12 const recorder = useRef();
13 const stream = useRef();
14 const [recording, setRecording] = useState(false);
15 const [transcribed, setTranscribed] = useState('');
16
17 useEffect(() => {
18 return () => {
19 if (chunks.current) {
20 chunks.current = [];
21 }
22 if (encoder.current) {
23 encoder.current.flush();
24 encoder.current = undefined;
25 }
26 if (recorder.current) {
27 recorder.current.destroy();
28 recorder.current = undefined;
29 }
30
31 if (stream.current) {
32 stream.current.getTracks().forEach((track) => track.stop());
33 stream.current = undefined;
34 }
35 };
36 }, []);
37
38 const onStartStreaming = async () => {
39 try {
40 if (stream.current) {
41 stream.current.getTracks().forEach((track) => track.stop());
42 }
43 stream.current = await navigator.mediaDevices.getUserMedia({
44 audio: true,
45 });
46 } catch (err) {
47 console.error(err);
48 }
49 };
50
51 const onStopStreaming = () => {
52 if (stream.current) {
53 stream.current.getTracks().forEach((track) => track.stop());
54 stream.current = undefined;
55 }
56 };
57
58 const handleStartRecording = async () => {
59 try {
60 setTranscribed('');
61
62 if (!stream.current) {
63 await onStartStreaming();
64 }
65 if (stream.current) {
66 if (!recorder.current) {
67 const {
68 default: { RecordRTCPromisesHandler, StereoAudioRecorder },
69 } = await import('recordrtc');
70 const recorderConfig = {
71 mimeType: 'audio/wav',
72 numberOfAudioChannels: 1,
73 recorderType: StereoAudioRecorder,
74 sampleRate: 44100,
75 timeSlice: streaming ? timeSlice : undefined,
76 type: 'audio',
77 ondataavailable: streaming ? onDataAvailable : undefined,
78 };
79 recorder.current = new RecordRTCPromisesHandler(
80 stream.current,
81 recorderConfig
82 );
83 }
84 if (!encoder.current) {
85 const { Mp3Encoder } = await import('@breezystack/lamejs');
86 encoder.current = new Mp3Encoder(1, 44100, 96);
87 }
88 const recordState = await recorder.current.getState();
89 if (recordState === 'inactive' || recordState === 'stopped') {
90 await recorder.current.startRecording();
91 }
92
93 setRecording(true);
94 }
95 } catch (err) {
96 console.error(err);
97 }
98 };
99
100 const handleStopRecording = async () => {
101 try {
102 if (recorder.current) {
103 const recordState = await recorder.current.getState();
104 if (recordState === 'recording' || recordState === 'paused') {
105 await recorder.current.stopRecording();
106 }
107
108 onStopStreaming();
109 setRecording(false);
110
111 await recorder.current.destroy();
112 chunks.current = [];
113 if (encoder.current) {
114 encoder.current.flush();
115 encoder.current = undefined;
116 }
117 recorder.current = undefined;
118 }
119 } catch (err) {
120 console.error(err);
121 }
122 };
123
124 const onDataAvailable = async (data) => {
125 try {
126 if (streaming && recorder.current) {
127 if (encoder.current) {
128 const buffer = await data.arrayBuffer();
129 const mp3chunk = encoder.current.encodeBuffer(new Int16Array(buffer));
130 const mp3blob = new Blob([mp3chunk], { type: 'audio/mpeg' });
131 chunks.current.push(mp3blob);
132 }
133 const recorderState = await recorder.current.getState();
134 if (recorderState === 'recording') {
135 const blob = new Blob(chunks.current, { type: 'audio/mpeg' });
136 const file = new File([blob], 'speech.mp3', { type: 'audio/mpeg' });
137 const text = await transcriptionService(
138 file,
139 apiKey,
140 whisperApiEndpoint,
141 'transcriptions'
142 );
143 setTranscribed(text);
144 }
145 }
146 } catch (err) {
147 console.error(err);
148 }
149 };
150
151 return {
152 recording,
153 transcribed,
154 handleStartRecording,
155 handleStopRecording,
156 setTranscribed,
157 };
158};
In the code above, you may notice that we have a useEffect
cleanup hook. This is just to ensure that any allocated resources are cleaned up when the component using this hook unmounts.
The transcriptionService
of our hook will call the Whisper
API using Axios. We append our audio file to the body, which is created using the built-in Javascript function FormData()
.
Create a util directory in the root of the application and then create a file named transcriptionService.js
and paste in the following code:
1import axios from 'axios';
2
3export const transcriptionService = async (
4 file,
5 apiKey,
6 whisperApiEndpoint,
7 mode
8) => {
9 const body = new FormData();
10 body.append('file', file);
11 body.append('model', 'whisper-1');
12 body.append('language', 'en');
13
14 const headers = {};
15 headers['Content-Type'] = 'multipart/form-data';
16
17 if (apiKey) {
18 headers['Authorization'] = `Bearer ${apiKey}`;
19 }
20
21 const response = await axios.post(`${whisperApiEndpoint}${mode}`, body, {
22 headers,
23 });
24
25 return response.data.text;
26};
That's all the code we need to transcribe from our system's mic input.
Let's look at building the UI so we can reason visually about where to connect the API later. We will need to create a dashboard that shows our saved meetings and allows us to start new ones, and then a view to show the transcriptions; let's finish off the transcription view and then build the dashboard.
First, delete everything in the globals.css
file in the styles
directory and replace it with the following core styles:
1html,
2body {
3 padding: 0;
4 margin: 0;
5 font-family: -apple-system, BlinkMacSystemFont, Segoe UI, Roboto, Oxygen,
6 Ubuntu, Cantarell, Fira Sans, Droid Sans, Helvetica Neue, sans-serif;
7}
8
9:root {
10 --primary: #4945ff;
11 --primaryLight: #7572ff;
12 --secondary: #8c4bff;
13 --secondaryLight: #a47fff;
14 --headerColor: #1a1a1a;
15 --bodyTextColor: #4e4b66;
16 --bodyTextColorWhite: #fafbfc;
17 /* 13px - 16px */
18 --topperFontSize: clamp(0.8125rem, 1.6vw, 1rem);
19 /* 31px - 49px */
20 --headerFontSize: clamp(1.9375rem, 3.9vw, 3.0625rem);
21 --bodyFontSize: 1rem;
22 /* 60px - 100px top and bottom */
23 --sectionPadding: clamp(3.75rem, 7.82vw, 6.25rem) 1rem;
24}
25
26*,
27*:before,
28*:after {
29 /* prevents padding from affecting height and width */
30 box-sizing: border-box;
31}
Create the container directory in the application's root and then create a file named TranscribeContainer.js
. This is where we can use our recording hook to capture and display the transcriptions. Paste the following code into the newly created file:
1import React, { useState } from 'react';
2import styles from '../styles/Transcribe.module.css';
3import { useAudioRecorder } from '../hooks/useAudioRecorder';
4import RecordingControls from '../components/transcription/RecordingControls';
5import TranscribedText from '../components/transcription/TranscribedText';
6
7const mockAnswer =
8 'Example answer to transcription here: Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!';
9const mockAnalysis =
10 'Example analysis to transcription here: Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!';
11
12const TranscribeContainer = ({ streaming = true, timeSlice = 1000 }) => {
13 const [analysis, setAnalysis] = useState('');
14 const [answer, setAnswer] = useState('');
15 const apiKey = process.env.NEXT_PUBLIC_OPENAI_API_KEY;
16 const whisperApiEndpoint = 'https://api.openai.com/v1/audio/';
17 const { recording, transcribed, handleStartRecording, handleStopRecording, setTranscribed } =
18 useAudioRecorder(streaming, timeSlice, apiKey, whisperApiEndpoint);
19
20 const handleGetAnalysis = () => {
21 setAnalysis(mockAnalysis);
22 };
23
24 const handleGetAnswer = () => {
25 setAnswer(mockAnswer);
26 };
27
28 const handleStopMeeting = () => {};
29
30 return (
31 <div style={{ margin: '20px' }}>
32 <button
33 className={styles['end-meeting-button']}
34 onClick={handleStopMeeting}
35 >
36 End Meeting
37 </button>
38 <input
39 type="text"
40 placeholder="Meeting title here..."
41 className={styles['custom-input']}
42 />
43 <div>
44 <RecordingControls
45 handleStartRecording={handleStartRecording}
46 handleStopRecording={handleStopRecording}
47 />
48 {recording ? (
49 <p className={styles['primary-text']}>Recording</p>
50 ) : (
51 <p>Not recording</p>
52 )}
53 <TranscribedText
54 transcribed={transcribed}
55 answer={answer}
56 analysis={analysis}
57 handleGetAnalysis={handleGetAnalysis}
58 handleGetAnswer={handleGetAnswer}
59 />
60 </div>
61 </div>
62 );
63};
64
65export default TranscribeContainer;
Here, we import the useAudioRecorder
hook, initialize it with the required variables, and structure the values we need from it. We also have an end meeting button and an input where users can name their meeting.
There are some display components: RecordingControls
, which will just be a component to keep our control buttons, and TranscribeText
, which will be used to display our transcriptions and any analysis we get from chatGPT
. As you can see from the code above, we are passing the text props to it and a couple of functions, which will just be mocked for now.
components
directory, and inside that, create a transcription
directory. Create a file named RecordingControls.js
and paste the following code:1import styles from '../../styles/Transcribe.module.css';
2
3function RecordingControls({ handleStartRecording, handleStopRecording }) {
4 return (
5 <div className={styles['control-container']}>
6 <button
7 className={styles['primary-button']}
8 onClick={handleStartRecording}
9 >
10 Start Recording
11 </button>
12 <button
13 className={styles['secondary-button']}
14 onClick={handleStopRecording}
15 >
16 Stop Recording
17 </button>
18 </div>
19 );
20}
21
22export default RecordingControls;
This is just a simple flex container with a couple of buttons.
TranscribedText.js
and paste the following code inside:1import styles from '../../styles/Transcribe.module.css';
2
3function TranscribedText({
4 transcribed,
5 answer,
6 analysis,
7 handleGetAnalysis,
8 handleGetAnswer,
9}) {
10 return (
11 <div className={styles['transcribed-text-container']}>
12 <div className={styles['speech-bubble-container']}>
13 {transcribed && (
14 <div className={styles['speech-bubble']}>
15 <div className={styles['speech-pointer']}></div>
16 <div className={styles['speech-text-question']}>{transcribed}</div>
17 <div className={styles['button-container']}>
18 <button
19 className={styles['primary-button-analysis']}
20 onClick={handleGetAnalysis}
21 >
22 Get analysis
23 </button>
24 <button
25 className={styles['primary-button-answer']}
26 onClick={handleGetAnswer}
27 >
28 Get answer
29 </button>
30 </div>
31 </div>
32 )}
33 </div>
34 <div>
35 <div className={styles['speech-bubble-container']}>
36 {analysis && (
37 <div className={styles['analysis-bubble']}>
38 <div className={styles['analysis-pointer']}></div>
39 <p style={{ margin: 0 }}>Analysis</p>
40 <div className={styles['speech-text-answer']}>{analysis}</div>
41 </div>
42 )}
43 </div>
44 <div className={styles['speech-bubble-container']}>
45 {answer && (
46 <div className={styles['speech-bubble-right']}>
47 <div className={styles['speech-pointer-right']}></div>
48 <p style={{ margin: 0 }}>Answer</p>
49 <div className={styles['speech-text-answer']}>{answer}</div>
50 </div>
51 )}
52 </div>
53 </div>
54 </div>
55 );
56}
57
58export default TranscribedText;
This is just to display each transcribed chunk of text with its corresponding information.
We need to create the CSS module files so our components display correctly. In the styles
directory, create a file named Transcribe.modules.css
and paste in the following CSS code:
1.control-container {
2 margin: 0 auto;
3 width: 380px;
4}
5
6.button-container {
7 display: flex;
8 justify-content: flex-end;
9 margin: 10px;
10}
11
12.primary-text {
13 color: var(--primaryLight);
14}
15
16.primary-button {
17 background-color: var(--primary);
18 color: white;
19 border: none;
20 border-radius: 5px;
21 padding: 10px 20px;
22 font-size: 16px;
23 cursor: pointer;
24 margin: 10px;
25}
26
27.primary-button:hover {
28 background-color: var(--primaryLight);
29}
30
31.primary-button-analysis {
32 background-color: var(--secondaryLight);
33 color: black;
34 border: none;
35 border-radius: 5px;
36 padding: 10px 20px;
37 font-size: 16px;
38 cursor: pointer;
39 margin: 10px;
40}
41
42.primary-button-answer {
43 background-color: #c8e6c9;
44 color: black;
45 border: none;
46 border-radius: 5px;
47 padding: 10px 20px;
48 font-size: 16px;
49 cursor: pointer;
50 margin: 10px;
51}
52
53.primary-button-answer:hover {
54 background-color: var(--primaryLight);
55}
56.primary-button-analysis:hover {
57 background-color: var(--primaryLight);
58}
59
60.secondary-button {
61 background-color: #d3d3d3;
62 color: black;
63 border: none;
64 border-radius: 5px;
65 padding: 10px 20px;
66 font-size: 16px;
67 cursor: pointer;
68}
69
70.secondary-button:hover {
71 background-color: #b0b0b0;
72}
73
74.end-meeting-button {
75 background-color: red;
76 color: white;
77 border: none;
78 border-radius: 5px;
79 padding: 10px 20px;
80 font-size: 16px;
81 cursor: pointer;
82}
83
84.end-meeting-button {
85 position: absolute;
86 top: 0;
87 right: 0;
88 padding: 10px 20px;
89 background-color: red;
90 color: white;
91 border: none;
92 border-radius: 5px;
93 cursor: pointer;
94 margin: 20px;
95}
96
97.end-meeting-button:hover {
98 background-color: darkred;
99}
100
101.transcribed-text-container {
102 position: relative;
103 display: flex;
104 flex-direction: row;
105 align-items: center;
106 justify-content: space-between;
107}
108
109.speech-bubble-container {
110 width: 80%;
111 margin: 20px;
112}
113
114.speech-bubble {
115 position: relative;
116 background-color: var(--primaryLight);
117 border: 2px solid var(--primaryLight);
118 border-radius: 8px;
119 padding: 10px;
120}
121
122.speech-pointer {
123 position: absolute;
124 top: 0;
125 left: -19px;
126 width: 0;
127 height: 0;
128 border-style: solid;
129 border-width: 0 0 20px 20px;
130 border-color: transparent transparent var(--primaryLight) transparent;
131}
132
133.speech-text-question {
134 margin: 0;
135 font-size: 16px;
136 line-height: 16px;
137 letter-spacing: 1.4px;
138 font-family: 'Gill Sans', 'Gill Sans MT', Calibri, 'Trebuchet MS', sans-serif;
139 color: white;
140}
141
142.speech-bubble-right {
143 position: relative;
144 background-color: #c8e6c9;
145 border: 2px solid #c8e6c9;
146 border-radius: 8px;
147 padding: 10px;
148}
149
150.speech-pointer-right {
151 position: absolute;
152 top: -2px;
153 right: -17px;
154 width: 0;
155 height: 0;
156 border-style: solid;
157 border-width: 0 0 20px 20px;
158 border-color: transparent transparent transparent #c8e6c9;
159}
160
161.speech-text-answer {
162 margin: 0;
163 font-size: 14px;
164 line-height: 21px;
165 letter-spacing: 1.8px;
166 font-family: 'Gill Sans', 'Gill Sans MT', Calibri, 'Trebuchet MS', sans-serif;
167 color: black;
168}
169
170.analysis-bubble {
171 position: relative;
172 background-color: var(--secondaryLight);
173 border: 2px solid var(--secondaryLight);
174 border-radius: 8px;
175 padding: 10px;
176}
177
178.analysis-pointer {
179 position: absolute;
180 top: -2px;
181 right: -17px;
182 width: 0;
183 height: 0;
184 border-style: solid;
185 border-width: 0 0 20px 20px;
186 border-color: transparent transparent transparent var(--secondaryLight);
187}
188
189.transcribed-text-container {
190 position: relative;
191 display: flex;
192 flex-direction: row;
193 align-items: center;
194 justify-content: space-between;
195}
196
197.custom-input {
198 border: none;
199 border-bottom: 2px solid #000;
200 padding: 5px 0;
201 width: 100%;
202 box-sizing: border-box;
203 margin: 20px;
204 line-height: 1.15;
205 font-size: 4rem;
206}
207
208.custom-input:focus {
209 outline: none;
210 border-bottom: 2px solid var(--primary);
211 margin: 20px;
212}
213
214.title {
215 margin: 20px;
216 line-height: 1.15;
217 font-size: 4rem;
218}
219
220.goBackButton {
221 margin-right: 10px;
222 padding: 5px 10px;
223 background-color: #0070f3;
224 color: white;
225 border: none;
226 border-radius: 5px;
227 cursor: pointer;
228}
229
230.goBackButton:hover {
231 background-color: #005bb5;
232}
233
234@media (max-width: 700px) {
235 .transcribed-text-container {
236 flex-direction: column;
237 align-items: flex-start;
238 }
239
240 .button-container {
241 width: 100%;
242 }
243
244 .primary-button {
245 width: 100%;
246 margin: 5px 0;
247 }
248}
TranscribeContainer
will be accessed from the meeting dashboard, we must use the Next.js in-built router. To do that, we can just create a file in the pages
directory, so go ahead and create transcription.js
in the pages
directory and paste the following code in:1import React from 'react';
2import styles from '../styles/Home.module.css';
3import TranscribeContainer from '../containers/TranscribeContainer';
4
5const Transcription = () => {
6
7 return (
8 <div className={styles.container}>
9 <main className={styles.main}>
10 <TranscribeContainer />
11 </main>
12 </div>
13 );
14};
15
16export default Transcription;
Please add the following styles to the Home.module.css
file:
1.header {
2 display: flex;
3 align-items: center;
4 margin-top: 20px;
5}
6
7.goBackButton {
8 margin-right: 10px;
9 padding: 5px 10px;
10 background-color: #0070f3;
11 color: white;
12 border: none;
13 border-radius: 5px;
14 cursor: pointer;
15}
16
17.goBackButton:hover {
18 background-color: #005bb5;
19}
MeetingDashboardContainer.js
file in the containers
directory:1import React from 'react';
2import styles from '../styles/Meeting.module.css';
3import MeetingCard from '../components/meeting/MeetingCard';
4import Link from 'next/link';
5
6const meeting = [
7 {
8 overview:
9 'Overview of the meeting here Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!',
10 title: 'Example title 1',
11 },
12 {
13 overview:
14 'Overview of the meeting here Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!',
15 title: 'Example title 2',
16 },
17 {
18 overview:
19 'Overview of the meeting here Lorem ipsum dolor sit amet consectetur adipisicing elit. Velit distinctio quas asperiores reiciendis! Facilis quia recusandae velfacere delect corrupti!',
20 title: 'Example title 3',
21 },
22];
23
24const MeetingDashboardContainer = () => {
25 return (
26 <div id={styles['meeting-container']}>
27 <div class={styles['cs-container']}>
28 <div class={styles['cs-content']}>
29 <div class={styles['cs-content-flex']}>
30 <span class={styles['cs-topper']}>Meeting dashboard</span>
31 <h2 class={styles['cs-title']}>Start a new meeting!</h2>
32 </div>
33 <Link href="/transcription" class={styles['cs-button-solid']}>
34 New meeting
35 </Link>
36 </div>
37 <ul class={styles['cs-card-group']}>
38 {meeting.map((val, i) => {
39 return (
40 <MeetingCard key={i} title={val.title} overview={overview.split(' ').slice(0, 30).join(' ') + '...'} />
41 );
42 })}
43 </ul>
44 </div>
45 </div>
46 );
47};
48
49export default MeetingDashboardContainer;
This is where the user will first land in our application; it's just a page to welcome the user, show a history of saved meetings, and allow them to start a new one.
For now, we are mocking the data, which we will later get from our API with the const called meeting
. We are mapping out the contents of that and displaying it with a component called MeetingCard
. Notice we are truncating the overview parameter passed to MeetingCard
as this will likely be a long paragraph, and we only want to display a preview in the card. Let's create the MeetingCard
component now.
meeting
in the components
directory and a file called MeetingCard.js
with the following:1import styles from '../../styles/Meeting.module.css';
2
3const MeetingCard = ({ title, overview }) => {
4 return (
5 <li class={styles['cs-item']}>
6 <div class={styles['cs-flex']}>
7 <h3 class={styles['cs-h3']}>{title}</h3>
8 <p class={styles['cs-item-text']}>{overview}</p>
9 <a href="" class={styles['cs-link']}>
10 Open meeting
11 <img
12 class={styles['cs-arrow']}
13 loading="lazy"
14 decoding="async"
15 src="https://csimg.nyc3.cdn.digitaloceanspaces.com/Icons/event-chevron.svg"
16 alt="icon"
17 width="20"
18 height="20"
19 aria-hidden="true"
20 />
21 </a>
22 </div>
23 </li>
24 );
25};
26
27export default MeetingCard;
Now, let's create the styles for the Meeting dashboard. Create a file called Meeting.module.css
in the styles
directory with the following CSS:
1@media only screen and (min-width: 0rem) {
2 #meeting-container {
3 padding: var(--sectionPadding);
4 position: relative;
5 z-index: 1;
6 min-height: 100vh;
7 }
8 #meeting-container .cs-container {
9 width: 100%;
10 max-width: 49rem;
11 margin: auto;
12 display: flex;
13 flex-direction: column;
14 align-items: center;
15 gap: clamp(3rem, 6vw, 4rem);
16 min-height: 100vh;
17 }
18 #meeting-container .cs-content {
19 text-align: left;
20 width: 100%;
21 display: flex;
22 flex-direction: column;
23 align-items: flex-start;
24 }
25
26 #meeting-container .cs-title {
27 max-width: 20ch;
28 }
29 #meeting-container .cs-button-solid {
30 font-size: 1rem;
31 line-height: clamp(2.875rem, 5.5vw, 3.5rem);
32 text-decoration: none;
33 font-weight: 700;
34 text-align: center;
35 margin: 0;
36 color: white;
37 min-width: 12.5rem;
38 padding: 0 1.5rem;
39 background-color: var(--secondary);
40 border-radius: 0.5rem;
41 display: inline-block;
42 position: relative;
43 z-index: 1;
44 box-sizing: border-box;
45 transition: color 0.3s;
46 cursor: pointer;
47 }
48 #meeting-container .cs-button-solid:before {
49 content: '';
50 position: absolute;
51 height: 100%;
52 width: 0%;
53 background: #000;
54 opacity: 1;
55 top: 0;
56 left: 0;
57 z-index: -1;
58 border-radius: 0.5rem;
59 transition: width 0.3s;
60 }
61 #meeting-container .cs-button-solid:hover {
62 color: #fff;
63 }
64 #meeting-container .cs-button-solid:hover:before {
65 width: 100%;
66 }
67 #meeting-container .cs-card-group {
68 width: 100%;
69 padding: 0;
70 margin: 0;
71 display: grid;
72 grid-template-columns: repeat(12, 1fr);
73 gap: 1.25rem;
74 }
75 #meeting-container .cs-item {
76 text-align: left;
77 list-style: none;
78 border-radius: 1rem;
79 overflow: hidden;
80 background-color: #f7f7f7;
81 border: 1px solid #e8e8e8;
82 grid-column: span 12;
83 display: flex;
84 flex-direction: column;
85 justify-content: space-between;
86 position: relative;
87 z-index: 1;
88 transition: box-shadow 0.3s, transform 0.3s;
89 }
90 #meeting-container .cs-item:hover {
91 box-shadow: rgba(149, 157, 165, 0.2) 0px 8px 24px;
92 }
93 #meeting-container .cs-item:hover .cs-picture img {
94 opacity: 0.3;
95 transform: scale(1.1);
96 }
97 #meeting-container .cs-flex {
98 height: 100%;
99 padding: 1.5rem;
100 /* prevents padding and border from affecting height and width */
101 box-sizing: border-box;
102 display: flex;
103 flex-direction: column;
104 align-items: flex-start;
105 position: relative;
106 z-index: 2;
107 }
108 #meeting-container .cs-h3 {
109 font-size: 1.25rem;
110 text-align: inherit;
111 line-height: 1.2em;
112 font-weight: 700;
113 color: var(--headerColor);
114 margin: 0 0 0.75rem 0;
115 transition: color 0.3s;
116 }
117 #meeting-container .cs-item-text {
118 /* 14px - 16px */
119 font-size: clamp(0.875rem, 1.5vw, 1rem);
120 line-height: 1.5em;
121 text-align: inherit;
122 margin: 0 0 1.25rem;
123 color: var(--bodyTextColor);
124 }
125 #meeting-container .cs-link {
126 font-size: 1rem;
127 line-height: 1.2em;
128 font-weight: 700;
129 text-decoration: none;
130 margin-top: auto;
131 color: var(--primary);
132 display: flex;
133 align-items: center;
134 justify-content: center;
135 cursor: pointer;
136 }
137 #meeting-container .cs-link:hover .cs-arrow {
138 transform: translateX(0.25rem);
139 }
140 #meeting-container .cs-arrow {
141 width: 1.25rem;
142 height: auto;
143 transition: transform 0.3s;
144 }
145}
146/* Tablet - 768px */
147@media only screen and (min-width: 48rem) {
148 #meeting-container .cs-container {
149 max-width: 80rem;
150 }
151 #meeting-container .cs-content {
152 text-align: left;
153 flex-direction: row;
154 justify-content: space-between;
155 align-items: flex-end;
156 }
157 #meeting-container .cs-title {
158 margin: 0;
159 }
160 #meeting-container .cs-item {
161 grid-column: span 4;
162 }
163}
164
165.cs-topper {
166 font-size: var(--topperFontSize);
167 line-height: 1.2em;
168 text-transform: uppercase;
169 text-align: inherit;
170 letter-spacing: 0.1em;
171 font-weight: 700;
172 color: var(--primary);
173 margin-bottom: 0.25rem;
174 display: block;
175}
176
177.cs-title {
178 font-size: var(--headerFontSize);
179 font-weight: 900;
180 line-height: 1.2em;
181 text-align: inherit;
182 max-width: 43.75rem;
183 margin: 0 0 1rem 0;
184 color: var(--headerColor);
185 position: relative;
186}
187
188.cs-text {
189 font-size: var(--bodyFontSize);
190 line-height: 1.5em;
191 text-align: inherit;
192 width: 100%;
193 max-width: 40.625rem;
194 margin: 0;
195 color: var(--bodyTextColor);
196}
Lastly, import MeetingDashboardContainer
into index.js
:
1import Head from 'next/head';
2import styles from '../styles/Home.module.css';
3import TranscribeContainer from '../containers/TranscribeContainer';
4
5export default function Home() {
6 return (
7 <div className={styles.container}>
8 <Head>
9 <title>Strapi Transcribe</title>
10 <meta name="description" content="Generated by create next app" />
11 <link rel="icon" href="/favicon.ico" />
12 </Head>
13
14 <main className={styles.main}>
15 <h1 className={styles.title}>
16 Welcome to{' '}
17 <a target="_blank" href="https://strapi.io">
18 Strapi Transcribe!
19 </a>
20 </h1>
21 <MeetingDashboardContainer />
22 </main>
23
24 <footer className={styles.footer}>Powered by Strapi</footer>
25 </div>
26 );
27}
Now that we have our dashboard UI and transcription view set up, we can test the code.
Open up your terminal, navigate to the frontend, and run the below command:
yarn dev
Now navigate to http://localhost:3000
in your browser, and you should be able to see the following interface:
To start transcribing, first, click on "New meeting" and then just click "Start recording." Then, talk into your computer's microphone (be aware that this will cost you OpenAI credits, so don't leave it running for too long). You can click stop recording to stop the transcription.
Test your app by clicking on the "New Meeting" button as shown in the GIF below:
For a more real-world use case, you can open your desktop meeting app (Slack or Teams) and then send yourself a meeting invite and join from your mobile phone. If you hit record, you can then speak through your phone from another room (to avoid feedback). You will see that the app picks up and transcribes what you're saying through the laptop's speakers using the mic, successfully simulating a transcription of a virtual meeting.
In part two of this series, we will set up our backend with Strapi. Stay tuned to see how we will structure our data to save meetings and transcriptions programmatically with the API and how we will link this to our Next.js app.
Hey! 👋 I'm Mike, a seasoned web developer with 5 years of full-stack expertise. Passionate about tech's impact on the world, I'm on a journey to blend code with compelling stories. Let's explore the tech landscape together! 🚀✍️