Reading10 min·Lesson 1 of 5

Your Data and AI Models

Every time you use an AI tool — from chatbots to job search platforms — you are sharing data. Understanding what happens to that data is one of the most valuable skills you can build in today's digital world.

What Data Do AI Models Actually Use?

AI models like the ones powering Rafiki, ChatGPT, or Google Gemini were built by training on enormous collections of text, images, and other information. But training data is just the beginning. When you use an AI tool today, you are generating new data in real time — your questions, your preferences, even how long you pause before typing.

🧠

Key Concept: There are two types of data to think about with AI — training data (used to build the model) and usage data (collected when you interact with it). Both matter for your privacy.

When you type a message to an AI assistant, that message may be stored on servers, reviewed by engineers to improve the product, used to personalise future responses, or in some cases shared with third parties. The exact rules depend on the platform's privacy policy — a long document most of us never read.

A Nairobi Example: Searching for a Job

Imagine you are a recent Kenyatta University graduate looking for a data analyst job in Nairobi. You open a popular AI-powered job platform and type: "I have a degree in statistics, I live in Westlands, and I need a job paying at least KES 80,000." You have just shared your education level, location, and salary expectations with a system you may know very little about.

⚠️

Think Before You Type: AI chat boxes feel conversational and private — like talking to a friend on WhatsApp. But unlike WhatsApp messages, many AI platforms store your inputs indefinitely and use them to improve their systems. Treat them more like an email to a company than a private chat.

How AI Models Learn From You

Many AI services use a process called feedback learning. When you give a thumbs-up or thumbs-down to a response, you are teaching the model. When you rephrase a question because the first answer was bad, that rephrasing is also data. This is how AI companies improve their products — but it also means your interactions have a long life beyond the moment you close the browser tab.

Prompts you type may be stored and reviewed by human trainers
Your device type and location are often logged automatically
How you interact — which suggestions you click, how quickly you respond — is tracked as behavioural data
Account information links all of this back to your identity if you are logged in

Personal Data vs. Sensitive Data

Not all data carries the same risk. Your name or email address is personal data — it identifies you. But your health condition, ethnic background, political opinion, or financial situation is sensitive data — it can be used to discriminate against you or cause you real harm if exposed.

💡

Practical Rule: Before sharing any information with an AI tool, ask: "Would I be comfortable if my future employer, my bank, or a stranger on the internet saw this?" If the answer is no, keep it off the platform — or at minimum, use a version that does not include identifying details.

What Responsible AI Companies Should Do

Good AI products are designed with privacy in mind from the start — a principle called privacy by design. This means they collect only what they need, store it safely, tell you clearly what they are doing, and give you control over your data. When a platform gives you options to delete your history, opt out of training, or download your data, those are signs of a more responsible product.

In the next lesson, you will learn about the Kenya Data Protection Act — the law that gives you specific rights over your personal data and holds organisations accountable for how they handle it.

← Previous