AIpeeps
Posts
Learning Loop ♾️ 09-2025

Learning Loop ♾️ 09-2025

Your bi-weekly dose of learning AI, without the hype.

Gaurav Jain
October 18, 2025 • Estimated Reading Time: 6 minutes

Hello peeps,

A very warm welcome to the 9th edition of AIpeeps!

What we’re covering today

Claude + Skills → finally, AI that can sound more like you
Reward Function → how to teach AI what “good” means (and get better outputs)
ChatGPT’s Last Training Date → the real reason it sometimes feels outdated
Reinforcement Learning → how AI learns just like we do — by playing, failing, and trying again

💡3 Curious things I learnt

1. Define your tone to be more like you — Claude + Skills

Now, you don’t have to feel like AI-generated content doesn’t sound like you.

Claude just launched a new feature called “Skills” — and it’s exactly what we’ve been waiting for.

You can now define your own writing style — not just ask GPT to write formally, professionally, or friendly… haha.

I literally made my tone name on myself — “Gaurav.”

Here’s how you can do it 👇

Go to Claude → Control Adjustment → Create & edit Style

Style Option in Claude

Now you’ve got two options:

Upload a few articles, texts, or emails written by you.
Or simply describe how your tone sounds.

2 Option of Style in Claude

For experiment purposes, I’ve only described my tone for now (planning to upload docs later to refine it more).

Here’s how I described it:

The output feels quite intuitive — around 60–70% accurate.

I’ve actually used some of that tone-generated text in this very newsletter.

See, if you can spot which line feels slightly off from my natural tone 😉

2. Define Your “Reward Function” Upfront

When you prompt an AI, you’re not just asking a question — you’re actually teaching it through mini reinforcement learning.

Every time you:

Rephrase a prompt because the first answer wasn’t right → you’re adjusting the action for a better reward.
Give thumbs up/down feedback → you’re literally providing reward signals to help the system improve.
Iterate on responses (“Make it concise” or “Add examples”) → you’re teaching the model what “good” looks like.

So, if you want to make your prompts work smarter — define your reward function inside the prompt.

✅ Strong example:

“Write a marketing email that feels personal (not salesy), keeps busy executives reading (under 150 words), and ends with a soft call-to-action (not pushy).”

You’ve now built a reward map inside your prompt:

Points for personal tone (- for being salesy)
Points for brevity
Points for gentle CTA (- for being pushy)

I even ran this prompt both with and without those signals — you can clearly see the difference in the result. Check Out the Results Here

3. Last Date of ChatGPT Training

You may not believe this, but ChatGPT was last trained in June 2024.

Still not convinced? Here’s the screenshot and link to chat.

ChatGPT Last Training

So if you switch off the below new setting ChatGPT recently added, it’ll start giving weird answers for events after June 2024 — basically becoming outdated in seconds.

That also means… ChatGPT completely depends on real-time updates from “human” in the web now.

🤔 What’s your take on that? How Human is replaceable?

🕵🏻Decoding the Jargon

Reinforcement Learning

Imagine teaching a dog to fetch.

You don’t explain physics or timing — you just throw the ball.
When the dog brings it back, it gets a treat.
Wrong move? No treat.

Over time, the dog figures out what works.
That’s reinforcement learning.

Now replace the dog with an AI agent — a bot, a game player, or a trading model.
It tries different actions, gets a “score” for each move — high for good, low for bad.
No manual. Just trial, error, and feedback.

It’s kind of like learning to ride a bike.

You wobble, fall, adjust, and eventually your body learns balance — not by theory, but by thousands of micro-corrections.

The real magic?

You don’t tell AI how to do it — you just define what success looks like.
“Win the game.” “Maximize profit.” “Stay upright.”

The AI then discovers the how — through repetition, feedback, and reward.

In short — trial, error, and reward.

It’s nature’s oldest teaching method — just rebranded for the AI age.

Try it Out this week:

✔️ Try making your style with Claude.

✔️ Use a Reward style to sharpen your prompts.

If this helped you see AI differently, pass it to three friends who keep overthinking every AI headline… instead of learning how it actually works.

Share the love, flex what you learned, and let’s grow this convo. 🚀

Please feel free to share any questions or topics you're particularly curious about @ [email protected]

Hope you learned something new today!

Till next time,

Reply

or to participate.