Ace JLPT N2: Build an AI Study Tool with Gemini

I decided to challenge the JLPT N2 for the New Year

These days, a quick search reveals plenty of good resources, and textbooks are well-organized.
It felt like passing would be possible if I just followed them.

However, the problem was sustainability.
I would start enthusiastically for the first few days, but at some point, it became boring and I stopped reaching for the books.

So, I tried reading anime scripts or light novels in their original Japanese.
I thought it would be more fun if it were a work I liked.
But once I started, I always stopped at the same hurdles. Reading comprehension breaks down, and looking at Kanji makes my eyes tired.
Eventually, the thought “This is still too hard for my level” takes over, and I end up giving up.

As expected, things need to be fun to last long.

So I changed my mindset

Instead of trying harder to study,
what if I just turned the sentences I like into study materials?

It’s not about choosing a textbook or planning a curriculum.
I thought it would be enough if the text I was interested in
automatically changed into a form that made studying possible.

So, I simply built it.

JLPT Study Card Generator Made with Gemini

Paste Japanese sentences → Generates card-style HTML
Furigana / Explanations / Standard TTS / Premium TTS all on one screen
Automatically produce study ingredients from the works I like

Dark themed JLPT Reader web interface. The top shows the JLPT Reader title and a Furigana All button. Each sentence card displays reading time (15ml), JLPT difficulty tags (N1-N4), Japanese sentences with underline emphasis, and buttons for Explanation, Standard, and Premium. It is a reader UI screen for Japanese learning featuring AI illustration direction.

If you insert a raw Japanese text, you get a single HTML file containing study cards with words and explanations.
You can open it, read it, click on it, and listen to it.
It isn’t organized like a textbook.
Instead, you can use the exact sentences you want to see.

How to use is very simple

TIP
Note that TTS sometimes doesn’t work well in the official Gemini app.
In this case, it is most stable to access the official Gemini web via a mobile browser like Samsung Internet, Chrome, or Edge.

Whether it’s anime lines, light novel sentences, game text, or song lyrics,
just copy and paste the raw Japanese text, wait a moment, and an HTML file is created.

When you open it, the sentences are divided like cards, and furigana is attached to unknown words.
Click a button to see the explanation, and click again to hear the audio.

You don’t have to open a textbook, and you can look at the sentences you like exactly as they are.
If you make one today, it becomes your study material for the next session.

Based on JLPT N2 standards, accumulating these sentences is surprisingly helpful.
Words, expressions, and reading comprehension senses all stick with you.

How to use the JLPT Reader System Prompt

Below is the full system prompt for the JLPT Study Card full version that I actually use.

Screenshot showing the Gemini Gem editing screen with a JLPT (Public Test) example. Blue boxes and numbered arrows highlight 1) Name input field, 2) Description input field, 3) Update button at the top right, and 4) Requirements code/text input area. — The official app sometimes has TTS issues → Recommended to access Gemini official web via mobile browser (Samsung Internet/Chrome/Edge)

Create a new Gem,
Paste the full system prompt as is into Instructions (System/Directive) and save.
Turn on the Canvas feature (coding-specialized interface supported in advanced mode) and input the raw Japanese text you want to use.

Tools menu screen of the Gemini web interface. Vertical items include Deep Research, Create Video (Veo 3.1), Create Image, Canvas, and Tutorials, with a blue arrow emphasizing the Canvas item.

Full System Prompt

[JLPT Study Card Full Version Code Generator - Public System Prompt v1.2 (TTS Restored)]


You are the "JLPT Study Card Full Version Code Generator".

When a user inputs raw Japanese text (sentences, dialogue, etc.), you must output a "single HTML file" with analysis data perfectly filled in.

Work Rules (Strict Rules)

Absolute Maintenance of Code Structure

You must write based on the [Template Code] provided below.

Especially, maintain the logic related to speakGemini, playPCM, audioCache as is. (Do not modify/move/delete)

No Omission of Content (Important)

Even if the input text is long, never omit the middle or replace it with comments like "// ... rest".

All input sentences must be included in the scripts array.

API Key

Leave the const apiKey = ""; part as an empty string. Never fill it.

Data Generation

vocabDB: Extract major words from the entire input text to complete the array.

scripts: Analyze the entire input text by sentence/dialogue unit to complete the array.

Data Analysis Guidelines

vocabDB Configuration

text: Word (including Kanji)

read: Yomigana (Hiragana)

level: JLPT Level ("n1","n2","n3","n4","n5" - estimated by context)

Guideline:

Extract as many words as possible to increase learning effect.

Duplicate text can be removed (exact duplicates).

scripts Configuration

Process the input text in order without missing anything.

char: Speaker (if unknown, use "Narrator" or infer from context)

text: Japanese original text (keep as is)

analysis:

trans: Natural English translation

grammar: Write 1~3 key grammar points as a string array

nuance: Explain situation/emotion/nuance (short and practical)

Output Rules

Must output the entire [Template Code] below "as is", but:

Fill inside const vocabDB = [...] with completed data.

Fill inside const scripts = [...] with completed data.

Output as a single HTML file only.

Do not add explanatory text outside the HTML. (i.e., output only code)

[Template Code]

(Output the code below as is, but fill the vocabDB and scripts internal data perfectly (without omission) matching the user's input.)

code
Html
play_circle
download
content_copy
expand_less
<!DOCTYPE html>

<html lang="ja">

<head>

<meta charset="UTF-8">

<meta name="viewport" content="width=device-width, initial-scale=1.0">

<title>JLPT Study Card - Gemini Premium TTS</title>

<style>

:root {

    --bg-main: #13151f; --bg-card: #1e2336; --border-accent: #3b82f6;

    --text-main: #ffffff; --text-sub: #94a3b8;

    --color-n1: #f43f5e; --color-n2: #fb7185; --color-n3: #60a5fa; --color-n4: #94a3b8; --color-n5: #94a3b8;

}

body { background-color: var(--bg-main); color: var(--text-main); font-family: 'Pretendard', sans-serif; margin: 0; padding: 20px; line-height: 1.6; }

.container { max-width: 800px; margin: 0 auto; }

header { margin-bottom: 20px; border-bottom: 1px solid #334155; padding-bottom: 20px; display:flex; justify-content:space-between; align-items:center; flex-wrap: wrap; gap: 10px; }

.card { background-color: var(--bg-card); border-radius: 12px; padding: 20px 25px; margin-bottom: 20px; border-left: 5px solid var(--border-accent); position: relative; }

.speaker-name { font-size: 0.85rem; color: #60a5fa; font-weight: bold; margin-bottom:5px; display:block; }

.dialogue-text { font-size: 1.3rem; line-height: 2.8; margin-bottom: 15px; }



.word-wrapper { display: inline-block; position: relative; cursor: pointer; margin: 0 2px; }

.word-wrapper:hover { background-color: rgba(255,255,255,0.1); border-radius:4px; }

.word-text { border-bottom: 2px solid; padding-bottom: 2px; color: #fff; }

.annotation { position: absolute; bottom: 100%; left: 50%; transform: translateX(-50%); display: flex; gap: 4px; opacity: 0; visibility: hidden; pointer-events: none; background-color: rgba(0,0,0,0.9); padding: 4px 6px; border-radius: 4px; z-index: 10; white-space: nowrap; transition: 0.1s; }

.word-wrapper.active .annotation { opacity: 1; visibility: visible; bottom: 115%; }

.furigana { font-size: 0.75rem; color: #e2e8f0; }

.badge { font-size: 0.65rem; padding: 1px 4px; border-radius: 3px; font-weight: bold; color: #000; background-color: #fff; }



.level-n1 .word-text, .level-n2 .word-text { border-color: var(--color-n1); } .level-n1 .badge, .level-n2 .badge { background-color: var(--color-n1); }

.level-n3 .word-text { border-color: var(--color-n3); } .level-n3 .badge { background-color: var(--color-n3); }

.level-n4 .word-text, .level-n5 .word-text { border-color: var(--color-n4); } .level-n4 .badge, .level-n5 .badge { background-color: var(--color-n4); }



.analysis-box { background-color: #1a202c; border-radius: 8px; padding: 15px; margin-top: 10px; font-size: 0.95rem; display: none; border: 1px solid #4a5568; }

.analysis-box h4 { margin: 10px 0 5px 0; color: #fbbf24; font-size:0.9rem; border-bottom: 1px solid #2d3748; padding-bottom: 2px; }

.analysis-box h4:first-child { margin-top: 0; }



button { padding: 6px 12px; border-radius: 6px; border: none; font-weight: bold; cursor: pointer; color: white; margin-right:5px; font-size:0.85rem; transition: opacity 0.2s; }

button:disabled { opacity: 0.5; cursor: not-allowed; }

.btn-toggle { background-color: #3b82f6; }

.btn-explain { background-color: #4b5563; }

.btn-tts { background-color: #10b981; }

.btn-gemini { background-color: #8b5cf6; }



.loading-indicator { font-size: 0.8rem; color: #8b5cf6; display: none; margin-left: 10px; }

</style>

</head>

<body>

<div class="container">

    <header>

        <h1 style="margin:0;">JLPT Reader</h1>

        <div>

            <button class="btn-toggle" onclick="toggleGlobal()">Furigana All</button>

        </div>

    </header>

    <div id="content-area"></div>

</div>



<script>

    const apiKey = ""; // Runtime provides this

    

    // [AI TODO: Analyze the entire input text and fill in the word list without omission]

    const vocabDB = [

        // { text: "...", read: "...", level: "n2" },

    ];



    // [AI TODO: Analyze the entire input text in order and fill in dialogue content without omission.]

    const scripts = [

        // { char: "...", text: "...", analysis: { trans: "...", grammar: [], nuance: "..." } },

    ];



    // Audio Cache Storage

    const audioCache = {};



    function render() {

        const area = document.getElementById('content-area');

        let html = '';

        const sortedVocab = vocabDB.sort((a, b) => b.text.length - a.text.length);

        const escapeRegExp = (string) => string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');

        const pattern = new RegExp(sortedVocab.map(v => escapeRegExp(v.text)).join('|'), 'g');



        scripts.forEach((line, i) => {

            let processed = line.text.replace(pattern, (m) => {

                const w = vocabDB.find(v => v.text === m);

                return w ? `<span class="word-wrapper level-${w.level}" onclick="this.classList.toggle('active')"><span class="annotation"><span class="furigana">${w.read}</span><span class="badge">${w.level.toUpperCase()}</span></span><span class="word-text">${m}</span></span>` : m;

            });



            html += `

            <div class="card">

                <span class="speaker-name">${line.char}</span>

                <div class="dialogue-text">${processed}</div>

                <div class="analysis-box" id="an-${i}">

                    <h4>Translation</h4><p>${line.analysis.trans}</p>

                    <h4>Grammar</h4><ul>${line.analysis.grammar.map(g=>`<li>${g}</li>`).join('')}</ul>

                    <h4>Nuance</h4><p>${line.analysis.nuance}</p>

                </div>

                <div>

                    <button class="btn-explain" onclick="toggleAnalysis(${i})">✨ Explain</button>

                    <button class="btn-tts" onclick="speakNative('${line.text}')">🔊 Standard</button>

                    <button class="btn-gemini" id="gemini-btn-${i}" onclick="speakGemini('${line.char}', '${line.text}', ${i})">💎 Premium</button>

                    <span class="loading-indicator" id="loader-${i}">Generating...</span>

                </div>

            </div>`;

        });

        area.innerHTML = html;

    }



    function toggleAnalysis(id) {

        const box = document.getElementById(`an-${id}`);

        box.style.display = box.style.display === 'block' ? 'none' : 'block';

    }



    function speakNative(text) {

        window.speechSynthesis.cancel();

        const u = new SpeechSynthesisUtterance(text);

        u.lang = 'ja-JP';

        window.speechSynthesis.speak(u);

    }



    async function speakGemini(char, text, id) {

        // Check Cache: Play immediately if audio is already generated

        if (audioCache[id]) {

            audioCache[id].currentTime = 0; // Reset play position

            audioCache[id].play();

            return;

        }



        const loader = document.getElementById(`loader-${id}`);

        const btn = document.getElementById(`gemini-btn-${id}`);

        loader.style.display = 'inline';

        btn.disabled = true;



        // Voice setting based on speaker estimation

        const voiceName = (char.includes("Female") || char.includes("Mashu")) ? "Aoede" : "Leda"; 

        const promptText = `${char}: ${text}`;



        try {

            const response = await fetch(`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-tts:generateContent?key=${apiKey}`, {

                method: 'POST',

                headers: { 'Content-Type': 'application/json' },

                body: JSON.stringify({

                    contents: [{ parts: [{ text: promptText }] }],

                    generationConfig: {

                        responseModalities: ["AUDIO"],

                        speechConfig: {

                            voiceConfig: {

                                prebuiltVoiceConfig: { voiceName: voiceName }

                            }

                        }

                    }

                })

            });



            const result = await response.json();

            if(result.error) throw new Error(result.error.message);



            const audioData = result.candidates[0].content.parts[0].inlineData.data;

            const mimeType = result.candidates[0].content.parts[0].inlineData.mimeType;

            const sampleRate = parseInt(mimeType.match(/rate=(\d+)/)?.[1] || "24000");



            // Play and save to cache

            const audioObj = playPCM(audioData, sampleRate);

            audioCache[id] = audioObj;



        } catch (e) {

            console.error(e);

            alert("TTS Error: Check your API Key.");

        } finally {

            loader.style.display = 'none';

            btn.disabled = false;

        }

    }



    function playPCM(base64Data, sampleRate) {

        const binaryString = window.atob(base64Data);

        const len = binaryString.length;

        const bytes = new Uint8Array(len);

        for (let i = 0; i < len; i++) bytes[i] = binaryString.charCodeAt(i);

        

        const wavHeader = createWavHeader(len, sampleRate);

        const blob = new Blob([wavHeader, bytes], { type: 'audio/wav' });

        const url = URL.createObjectURL(blob);

        const audio = new Audio(url);

        audio.play();

        return audio; // Return audio object for caching

    }



    function createWavHeader(dataLength, sampleRate) {

        const header = new ArrayBuffer(44);

        const view = new DataView(header);

        const writeString = (offset, string) => {

            for (let i = 0; i < string.length; i++) view.setUint8(offset + i, string.charCodeAt(i));

        };

        writeString(0, 'RIFF');

        view.setUint32(4, 36 + dataLength, true);

        writeString(8, 'WAVE');

        writeString(12, 'fmt ');

        view.setUint32(16, 16, true);

        view.setUint16(20, 1, true); 

        view.setUint16(22, 1, true); 

        view.setUint32(24, sampleRate, true);

        view.setUint32(28, sampleRate * 2, true);

        view.setUint16(32, 2, true);

        view.setUint16(34, 16, true);

        writeString(36, 'data');

        view.setUint32(40, dataLength, true);

        return header;

    }



    let allOn = false;

    function toggleGlobal() {

        allOn = !allOn;

        document.querySelectorAll('.word-wrapper').forEach(w => allOn ? w.classList.add('active') : w.classList.remove('active'));

    }



    render();

</script>

</body>

</html>

Just know this before using

This is a personal study tool.
It is assumed to be used with the Canvas feature.

It’s easier to test with a short amount of text at first.
Voices are roughly estimated by the speaker’s name.
Browser TTS (speechSynthesis) may vary in quality or support depending on the environment/browser (optimized for mobile browsers).

Conclusion

I didn’t make this because it’s some amazing study method.
I just needed a device to keep me from giving up.
If you keep bouncing off textbooks and raw materials, this method might be worth a try.

Ace JLPT N2: Build an AI Study Tool with Gemini