Skip to content

Commit e02c809

Browse files
authored
Merge pull request #155 from codeharborhub/dev-1
content added for ml
2 parents 706c888 + 0650088 commit e02c809

File tree

4 files changed

+323
-0
lines changed

4 files changed

+323
-0
lines changed
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
title: "Reinforcement Learning: Learning through Action"
3+
sidebar_label: Reinforcement Learning
4+
description: "Understanding the Agent-Environment loop, reward signals, and how AI learns to make optimal decisions in dynamic systems."
5+
tags: [machine-learning, reinforcement-learning, robotics, q-learning, autonomous-systems]
6+
---
7+
8+
**Reinforcement Learning (RL)** is a type of machine learning where an **Agent** learns to make decisions by performing actions in an **Environment** to maximize a cumulative **Reward**.
9+
10+
Unlike Supervised Learning, where the model is told the "correct" answer, an RL agent learns from the consequences of its actions. It is a process of trial and error, much like how a human learns to ride a bicycle or how a dog is trained with treats.
11+
12+
## 1. The Core Components
13+
14+
To understand RL, you must understand the five pillars of the "RL Loop":
15+
16+
1. **The Agent:** The AI "learner" or decision-maker (e.g., a self-driving car software).
17+
2. **The Environment:** The world the agent interacts with (e.g., the road and traffic).
18+
3. **State ($S$):** The current situation of the agent (e.g., the car's current speed and position).
19+
4. **Action ($A$):** What the agent does (e.g., steer left, brake, or accelerate).
20+
5. **Reward ($R$):** Feedback from the environment (e.g., $+10$ points for reaching the destination, $-100$ for a collision).
21+
22+
## 2. The Learning Loop
23+
24+
The process is continuous and follows a cycle:
25+
26+
```mermaid
27+
graph TD
28+
A[Agent] -->|Action| E[Environment]
29+
E -->|Reward| A
30+
E -->|New State| A
31+
32+
style A fill:#e1f5fe,stroke:#01579b,color:#333
33+
style E fill:#f3e5f5,stroke:#7b1fa2,color:#333
34+
35+
```
36+
37+
1. The Agent observes the current **State**.
38+
2. The Agent selects an **Action** based on its "Policy" (strategy).
39+
3. The **Environment** changes in response to the action.
40+
4. The Agent receives a **Reward** or penalty.
41+
5. The Agent updates its strategy to prioritize actions that led to rewards.
42+
43+
## 3. Exploration vs. Exploitation
44+
45+
This is the most famous dilemma in Reinforcement Learning:
46+
47+
* **Exploitation:** The agent performs the action it *knows* gives the highest reward. (e.g., going to your favorite restaurant).
48+
* **Exploration:** The agent tries a *new* action to see if it leads to an even better reward. (e.g., trying a new restaurant that might be better or worse).
49+
50+
An effective RL agent must find the perfect balance: exploring early to find high-value strategies and exploiting later to maximize "points."
51+
52+
## 4. Key Concepts: Policy and Value
53+
54+
* **Policy ():** The agent's "brain" or strategy. It defines which action to take in a given state.
55+
* **Value Function ():** The agent's prediction of the *total future reward* it will get from its current state.
56+
57+
$$
58+
G_t = R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + ...
59+
$$
60+
61+
*In the formula above, (gamma) is the **discount factor**, which determines how much the agent cares about immediate rewards vs. long-term rewards.*
62+
63+
## 5. Real-World Applications
64+
65+
| Industry | RL Use Case |
66+
| --- | --- |
67+
| **Gaming** | Training AI to beat world champions in Chess, Go, or StarCraft II. |
68+
| **Robotics** | Teaching a robotic arm to pick up fragile objects without breaking them. |
69+
| **Finance** | Algorithmic trading where the agent learns when to buy/sell for max profit. |
70+
| **Healthcare** | Optimizing treatment plans for chronic diseases based on patient reactions. |
71+
| **NLP** | **RLHF** (Reinforcement Learning from Human Feedback)—used to make models like ChatGPT more helpful and safe. |
72+
73+
## 6. Comparison Table
74+
75+
| Feature | Supervised | Unsupervised | Reinforcement |
76+
| --- | --- | --- | --- |
77+
| **Goal** | Mapping | Finding clusters | Maximize rewards |
78+
| **Data** | Labeled | Unlabeled | Interactions/Feedback |
79+
| **Learning** | Passive | Passive | **Active/Interactive** |
80+
81+
## References for More Details
82+
83+
* **[OpenAI Spinning Up in RL](https://spinningup.openai.com/en/latest/):** Practical implementation and code for modern RL algorithms.
84+
85+
---
86+
87+
**You have now explored all the major branches of Machine Learning! You know how models learn from labels, patterns, themselves, and rewards. Now, it's time to learn how to measure if these models are actually doing a good job.**
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: "Self-Supervised Learning: The Engine of Modern AI"
3+
sidebar_label: "Self-Supervised Learning"
4+
description: "How AI learns by predicting missing parts of its own input, powering Large Language Models and Computer Vision."
5+
tags: [machine-learning, self-supervised-learning, llm, bert, computer-vision, deep-learning]
6+
---
7+
8+
**Self-Supervised Learning (SSL)** is a paradigm where the model generates its own labels from the data itself. It eliminates the "bottleneck" of human labeling by hiding part of the input and asking the model to predict it.
9+
10+
If Supervised Learning is "Learning with a teacher," and Unsupervised Learning is "Learning alone," Self-Supervised Learning is **"Learning by solving puzzles."**
11+
12+
## 1. How it Works: The Pretext Task
13+
14+
In SSL, we create a **Pretext Task** a synthetic challenge where the "ground truth" is already contained within the data.
15+
16+
### A. In Natural Language Processing (NLP)
17+
This is how models like GPT and BERT are trained. We take a normal sentence and hide words:
18+
* **Original:** "The cat sat on the mat."
19+
* **Input:** "The cat [MASK] on the mat."
20+
* **Target:** "sat"
21+
22+
By predicting the masked word, the model forcedly learns grammar, context, and even world facts.
23+
24+
### B. In Computer Vision
25+
We can take an image and modify it to create a puzzle for the model:
26+
* **Rotation Prediction:** Rotate an image by 90° and ask the model "What is the orientation?" To answer, the model must understand what a "head" or "tree" looks like.
27+
* **Jigsaw Puzzles:** Shuffling patches of an image and asking the model to reassemble them.
28+
* **Colorization:** Giving the model a black-and-white photo and asking it to predict the colors.
29+
30+
## 2. The Two-Stage Pipeline
31+
32+
Self-supervised learning is almost always used as a "pre-training" step before a specific downstream task.
33+
34+
1. **Stage 1: Pre-training (SSL):** Train on a massive, unlabeled dataset (e.g., all of Wikipedia) to learn general representations.
35+
2. **Stage 2: Fine-tuning (Supervised):** Take that "smart" model and train it on a very small, labeled dataset for a specific task (e.g., medical sentiment analysis).
36+
37+
```mermaid
38+
graph TD
39+
Data[Massive Unlabeled Data] --> Pre[Pre-training: SSL]
40+
Pre --> Base[Base Model: Knows 'Context']
41+
Base --> Fine[Fine-tuning: Labeled Data]
42+
Fine --> Final[Specialized Model]
43+
44+
style Pre fill:#f3e5f5,stroke:#7b1fa2,color:#333
45+
style Final fill:#e8f5e9,stroke:#2e7d32,color:#333
46+
47+
```
48+
49+
## 3. Contrastive Learning
50+
51+
A popular modern approach to SSL is **Contrastive Learning**. Instead of predicting a missing part, the model learns to distinguish between "similar" and "different" things.
52+
53+
* **Positive Pair:** Two different crops of the same photo of a dog.
54+
* **Negative Pair:** A photo of a dog and a photo of a car.
55+
* **Goal:** The model learns to pull the "dog" representations together and push the "car" representation away in mathematical space.
56+
57+
## 4. Why SSL is the Future
58+
59+
| Feature | Supervised Learning | Self-Supervised Learning |
60+
| --- | --- | --- |
61+
| **Data Source** | Human-curated labels | Raw, "wild" data (Internet) |
62+
| **Scalability** | Limited by human hours | Unlimited (Scales with compute) |
63+
| **Knowledge** | Narrow/Task-specific | General/Versatile |
64+
65+
## 5. Real-World Impact
66+
67+
* **Large Language Models (LLMs):** Every "GPT" model uses SSL to learn how to predict the next token.
68+
* **Robotics:** Robots learning to understand their environment by predicting the next frame in a video feed.
69+
* **Medical AI:** Pre-training on millions of unlabeled X-rays to understand "anatomy" before learning to spot specific rare diseases.
70+
71+
## References for More Details
72+
73+
* **[Yann LeCun - Self-Supervised Learning](https://ai.meta.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/):** Understanding why the pioneers of AI believe SSL is "the dark matter of intelligence."
74+
* **[Illustrated Word2Vec](https://jalammar.github.io/illustrated-word2vec/):** A visual guide to the earliest successful SSL techniques in NLP.
75+
76+
---
77+
78+
**You've now seen how AI learns from labeled data, patterns, and even itself. There is one final, radical way for AI to learn: by "playing" and receiving rewards.**
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
title: "Semi-Supervised Learning: The Best of Both Worlds"
3+
sidebar_label: Semi-Supervised Learning
4+
description: "Combining small amounts of labeled data with large amounts of unlabeled data to improve model accuracy and reduce labeling costs."
5+
tags: [machine-learning, semi-supervised-learning, data-labeling, pseudo-labeling, active-learning]
6+
---
7+
8+
In the real world, data is plentiful, but **labels are expensive**.
9+
10+
* **Supervised Learning** requires every data point to be labeled by a human expert (expensive and slow).
11+
* **Unsupervised Learning** uses no labels but can't perform specific tasks like classification.
12+
13+
**Semi-Supervised Learning (SSL)** sits in between. It uses a small set of labeled data to "guide" the discovery of patterns in a much larger set of unlabeled data.
14+
15+
## 1. The Core Idea: Self-Training & Pseudo-Labeling
16+
17+
The most common technique in SSL is **Pseudo-labeling**. Instead of a human labeling millions of images, the model does it itself.
18+
19+
1. **Train:** Train a model on the small amount of human-labeled data.
20+
2. **Predict:** Use that model to predict labels for the large unlabeled dataset.
21+
3. **Filter:** Keep only the predictions where the model is highly confident (e.g., probability $> 95\%$). These are your "Pseudo-labels."
22+
4. **Retrain:** Combine the original human labels with the new pseudo-labels and train a final, more robust model.
23+
24+
## 2. Key Assumptions of SSL
25+
26+
For semi-supervised learning to work, the data must satisfy certain mathematical properties:
27+
28+
* **Continuity Assumption:** Points that are close to each other are likely to share the same label.
29+
* **Cluster Assumption:** Data tends to form discrete clusters. Points in the same cluster are likely to share a label.
30+
* **Manifold Assumption:** High-dimensional data lies on a lower-dimensional "manifold" or structure (like a 2D sheet crumpled in 3D space).
31+
32+
## 3. When to Use Semi-Supervised Learning
33+
34+
SSL is the standard approach in industries where expert labeling is prohibitively expensive:
35+
36+
* **Medical Imaging:** You have millions of X-rays, but only 1,000 have been analyzed by a specialized radiologist.
37+
* **Language Translation:** Huge amounts of raw text exist, but human-translated pairs are limited.
38+
* **Speech Analysis:** Thousands of hours of audio recordings, but only a fraction are transcribed.
39+
40+
## 4. SSL vs. Other Types
41+
42+
| Feature | Supervised | Unsupervised | Semi-Supervised |
43+
| :--- | :--- | :--- | :--- |
44+
| **Label Requirement** | 100% Labeled | 0% Labeled | ~1-10% Labeled |
45+
| **Cost** | Very High | Low | Moderate |
46+
| **Common Use Case** | Prediction | Discovery | Scaling Predictions |
47+
48+
```mermaid
49+
graph TD
50+
LD[Small Labeled Dataset] --> M1[Initial Model]
51+
UD[Large Unlabeled Dataset] --> M1
52+
M1 --> PL[Pseudo-Labels]
53+
PL --> M2[Final Robust Model]
54+
LD --> M2
55+
56+
style LD fill:#e8f5e9,stroke:#2e7d32,color:#333
57+
style UD fill:#fff3e0,stroke:#ef6c00,color:#333
58+
style PL fill:#e1f5fe,stroke:#01579b,color:#333
59+
60+
```
61+
62+
## 5. Active Learning: A Related Concept
63+
64+
Sometimes, SSL is combined with **Active Learning**. In this setup, the model identifies the specific unlabeled examples it is most "confused" about and asks a human expert to label *only those* specific points. This maximizes the value of human effort.
65+
66+
## References for More Details
67+
68+
* **[Scikit-Learn: Self Training](https://scikit-learn.org/stable/modules/semi_supervised.html):** Learning about `SelfTrainingClassifier` and `LabelPropagation`.
69+
70+
---
71+
72+
**Semi-supervised learning is about making the most of what you have. But what if there are no labels at all, and the AI must learn by "playing" in an environment?**
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
title: "Unsupervised Learning: Finding Hidden Structure"
3+
sidebar_label: Unsupervised Learning
4+
description: "Discovering patterns in unlabeled data through clustering, association, and dimensionality reduction."
5+
tags: [machine-learning, unsupervised-learning, clustering, dimensionality-reduction, anomaly-detection]
6+
---
7+
8+
In **Unsupervised Learning**, the model is given a dataset without explicit instructions on what to do with it. There are no labels ($y$), and there is no "teacher" to correct the model. Instead, the algorithm explores the data to find inherent structures, patterns, and groupings.
9+
10+
## 1. The Core Objective
11+
12+
The mathematical goal of unsupervised learning is to model the underlying probability distribution or structure of the input data ($X$).
13+
14+
$$
15+
P(X)
16+
$$
17+
18+
Instead of mapping $X \to y$, the model asks: *"How is $X$ organized?"*
19+
20+
## 2. Key Techniques and Use Cases
21+
22+
### A. Clustering
23+
Grouping data points so that objects in the same group (called a **cluster**) are more similar to each other than to those in other groups.
24+
* **Algorithm:** K-Means, DBSCAN, Hierarchical Clustering.
25+
* **Use Case:** **Customer Segmentation**. Grouping users by purchasing behavior to create targeted marketing campaigns.
26+
27+
### B. Dimensionality Reduction
28+
Reducing the number of random variables under consideration by obtaining a set of principal variables.
29+
* **Algorithm:** PCA (Principal Component Analysis), t-SNE.
30+
* **Use Case:** **Data Visualization**. Compressing 100+ features into a 2D plot to see if there are natural groupings in the data.
31+
32+
### C. Association Rule Learning
33+
Discovering interesting relations between variables in large databases.
34+
* **Algorithm:** Apriori, Eclat.
35+
* **Use Case:** **Market Basket Analysis**. Realizing that customers who buy "Diapers" are also highly likely to buy "Beer."
36+
37+
### D. Anomaly Detection
38+
Identifying rare items, events, or observations which raise suspicions by differing significantly from the majority of the data.
39+
* **Algorithm:** Isolation Forest, One-Class SVM.
40+
* **Use Case:** **Fraud Detection**. Spotting a credit card transaction that doesn't fit a user's normal spending profile.
41+
42+
## 3. Comparison: Supervised vs. Unsupervised
43+
44+
| Feature | Supervised Learning | Unsupervised Learning |
45+
| :--- | :--- | :--- |
46+
| **Data** | Labeled ($X, y$) | Unlabeled ($X$) |
47+
| **Goal** | Predict outcomes | Find hidden patterns |
48+
| **Feedback** | Direct (Correct/Incorrect) | None (Evaluation is subjective) |
49+
| **Complexity** | Usually simpler to evaluate | Harder to validate results |
50+
51+
## 4. The Challenge of Evaluation
52+
53+
In Supervised Learning, you can calculate "Accuracy." In Unsupervised Learning, there is no "ground truth" to compare against. Engineers often use internal metrics like:
54+
* **Silhouette Score:** Measures how similar an object is to its own cluster compared to other clusters.
55+
* **Inertia:** Measures how far the points within a cluster are from their center.
56+
57+
```mermaid
58+
graph TD
59+
Data[Unlabeled Data] --> Model[Unsupervised Model]
60+
Model --> Patterns[Discovered Patterns]
61+
Patterns --> Insight1[Segmented Groups]
62+
Patterns --> Insight2[Reduced Dimensions]
63+
Patterns --> Insight3[Anomalies/Outliers]
64+
65+
style Model fill:#fff3e0,stroke:#ef6c00,color:#333
66+
style Patterns fill:#e1f5fe,stroke:#01579b,color:#333
67+
68+
```
69+
70+
## 5. Real-World Applications
71+
72+
1. **Genetics:** Clustering DNA sequences to identify groups with similar genetic properties.
73+
2. **Recommendation Systems:** Finding "neighbor" users who like similar movies (Collaborative Filtering).
74+
3. **Search Engines:** Grouping similar search results or news articles into topics (Topic Modeling).
75+
76+
## References for More Details
77+
78+
* **[Scikit-Learn Unsupervised Learning](https://scikit-learn.org/stable/unsupervised_learning.html):** Technical documentation on clustering and manifold learning.
79+
80+
* **[Clustering Algorithms (Visual Guide)](https://projector.tensorflow.org/):** Playing with high-dimensional data in your browser.
81+
82+
83+
84+
---
85+
86+
**Unsupervised learning reveals the "who" and the "what" in your data. But what if you want an AI to learn how to make decisions through trial and error?**

0 commit comments

Comments
 (0)