2. Training Data

🎯 Overview

This document explains the training data structure for the 2D Point-Robot Navigator project, where a robot learns to navigate using only a 3×3 perception window in 10×10 environments.

Robot environment
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│R│█│ │ │█│ │ │ │ │ │ ← Random obstacles (█)
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │█│ │ │ │█│ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │ │ │ │ │█│ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │█│ │ │ │ │ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │█│ │ │ │ │█│ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │ │ │█│ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │█│ │ │ │ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │ │ │ │█│ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │ │ │ │ │ │ │ │
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘

Robot local view:
┌───┬───┬───┐
│   │ █ │   │
├───┼───┼───┤
│ █ │ R │   │  Should it go RIGHT or DOWN? => DOWN
├───┼───┼───┤
│   │   │   │
└───┴───┴───┘

What is the training data exactly? 3x3 percption and action tag ⇒ How does it work?

(state) = (local_view, goal_delta)
(label) = expert action (from A*)

9 features: Flattened 3×3 perception grid + goal_delta: $[d x, d y]$
==4 actions: UP(0), DOWN(1), LEFT(2), RIGHT(3)==

Training data input features

Imitation learning

Training an agent to mimic an expert’s behavior instead of discovering it purely by trial and error. You give the agent examples of state → expert action, then it learns to copy those actions based on local view.

A* explicitly uses the goal to guide search (through the heuristic h) and to decide when to stop (when the goal node is reached).

Example with A:*

Generate a 10×10 grid with obstacles.
Use A* (expert planner) to find the shortest path from start → goal.
At each step, record:
- State (3×3 local patch + relative goal vector)
- Expert action (the next move chosen by A*)
Train a neural network on these pairs.
Result: the network learns to navigate like A*, but only using its local 3×3 view and goal info.

Is final goal or destination, considered in A* algorithm to find optimal path

A* explicitly uses the goal to guide search (through the heuristic h) and to decide when to stop (when the goal node is reached). Heuristic function $h (n)$ : - To evaluate each candidate node n, A* uses a heuristic that depends on the goal position. - Example (distance in a grid): $h (n) = ∣ x_{g o a l} - x_{n} ∣ + ∣ y_{g o a l} - y_{n} ∣$ - Without the goal, you cannot compute h(n).

What are other alternatives to A* algorithm

Aspect	A*****	Imitation Learning (NN)	Reinforcement Learning (RL)
Needs full map?	✅ Yes	❌ No (local perception + goal)	❌ No (just rewards + perception)
Optimality	✅ Guaranteed	❌ Approximate	❌ Approximate
Data need	❌ None	✅ Needs expert demos (A*)	✅ Needs reward signals & lots of episodes
Computation	❌ Slow search	✅ Fast (forward pass)	❌ Slow to train, fast at test
Generalization	❌ Must re-run	✅ Generalizes across maps	✅ Can adapt to new maps via training
Interpretability	✅ Clear	❌ Black-box	❌ Black-box
Works in partial observability	❌ No	✅ Yes	✅ Yes
Applications	Routing, games, static maps	Robots, cars, drones	Complex robotics, unknown envs
✅ Summary:

A* = optimal, but unrealistic in partial observability.
Imitation learning = fast, practical, learns A* like policy from local views.
RL = flexible, works without expert, but training is costly.
In practice, often A (global) + NN (local) is used in real robots.

QA on training data

Is it require to include initial position, current position, and final goal in training data?

Initial position: ❌ Not required as an explicit feature.
- Because at each training step the agent’s local perception is always centered on itself, so the absolute start doesn’t matter.
How corner position or robot along the wall will look like in 3×3 perception?
Robot in the top-left corner
```
[1, 1, 1]
[1, 0, 0]
[1, 0, 0]
```
- Corners → two sides of the patch (rows/columns) padded with 1’s.
- Walls → one side padded with 1’s.
Current position: ❌ Not needed explicitly either.
- The 3×3 patch (with agent at the center) already encodes where it is relative to local obstacles.
Final position (goal coordinates): ✅ You need to encode the goal information (relative vector dx, dy) at every step.
- Without this, the network wouldn’t know which direction to head.
- goal_delta: $[d x, d y] = [(c_{g} - c_{a}) /10, (r_{g} - r_{a}) /10]$
- action: expert’s next move from A*

Is training and validation accuracy measured against final goal or its measured against predicted action? if its relative to predicted action, how to make sure if robot reaches its goal?

Step accuracy vs goal success

In Imitation Learning (supervised)

Training/validation accuracy is measured against the predicted action vs. expert action (from A*).
Example: if A* says “move East” and your network predicts “East,” that’s a correct classification.
So accuracy = % of steps where predicted action = expert action. This is like image classification: input (3×3 patch + goal delta) → label (expert’s next move). 👉 Here, you don’t directly measure “did the robot reach the final goal” during training. That comes later in evaluation.

To find optimal path, is final goal or destination included in A* algorithm?

Yes — A* explicitly uses the goal to guide search (through the heuristic h) and to decide when to stop (when the goal node is reached).

If A* generated path for NN training include goal information, then do we need to add explicit “goal_delta”: `[0.2, 0.1]` in the NN training data?

Even though the A* path “knows” the goal, the neural network doesn’t see the full map — it only gets:

Local 3×3 perception
Relative direction to the goal (goal_delta) Without explicitly giving the network the goal information, the same 3×3 local view could mean different correct actions depending on which goal you’re heading toward.

(state) = (local_view, goal_delta)
(label) = expert action (from A*)

Will the robot know the final destination during testing?

Yes — you must tell it the final goal at the start of each episode.

During testing, you don’t reveal the path, but you provide the goal coordinates.
The robot then computes goal_delta at every step.
Without this, the robot would only wander locally because it wouldn’t know which direction to go.

Does the robot need wall information in the environment?

Yes — the robot must perceive walls (map boundaries) inside its 3×3 patch, otherwise it won’t know it’s at the edge and might try to walk out of bounds.

*Include wall information in the 3×3 patch, by padding out-of-bounds cells with “wall = 1” in the obstacle channel.

📊 Data Structure Summary

INPUT (X_train): OUTPUT (y_train):
┌─────────────────────────────────┐ ┌──────────────┐
│ Type: float32 (OPTIMIZED!) │ │ Type: int8 │
│ Shape: (841, 9) │ │ Shape: (841,)│ For each sample there is corrsponding one action by Robot.
│ Values: 0.0 to 1.0 │ │ Values: 0-3 │
│ Meaning: 3×3 obstacle patterns │ │ Meaning: Actions │
└─────────────────────────────────┘ └──────────────┘

X_train shape: (841, 9)
┌─────────────────────────────────┐
│ Sample 0: [0,1,0,0,0,1,0,1,0] │ ← 9 features per sample 3×3 View along A* path
├─────────────────────────────────┤
│ Sample 1: [0,1,0,0,0,1,0,0,0] │ ← 9 features per sample 3×3 View along A* path
├─────────────────────────────────┤
│ Sample 2: [1,0,0,1,0,0,0,0,1] │ ← 9 features per sample 3×3 View along A* path
├─────────────────────────────────┤
│ ... │
├─────────────────────────────────┤
│ Sample 840: [0,0,1,0,1,0,0,0,0] │ ← 9 features per sample 3×3 View along A* path
└─────────────────────────────────┘

🔍 How to Visualize (841, 9) Shape:

Think of it as a Table in_N → Input Neuron

Sample	Feature1 (in_N1)	Feature2 (in_N2)	Feature3 (in_N3)	…	Feature9 (in_N9)	Action
0	0.0	1.0	0.0	…	0.0	1
1	0.0	1.0	0.0	…	0.0	1
2	1.0	0.0	0.0	…	1.0	0
…	…	…	…	…	…	…
840	0.0	0.0	1.0	…	0.0	2

Key Numbers:

841 samples: Total training examples
9 features: Flattened 3×3 perception grid
4 actions: UP(0), DOWN(1), LEFT(2), RIGHT(3)
100 environments: Diverse training scenarios

🧠 How Training Data is Generated

Input:  [0, 1, 0, 1, 1, 0, 0, 1, 0,   # 3x3 perception (9 features)
         0, 0, 0, 1,                   # Last action: UP (one-hot)
         0, 1, 0, 0,                   # 2nd last action: DOWN (one-hot) 
         1, 0, 0, 0]                   # 3rd last action: LEFT (one-hot)
Output: [2]                           # Action (LEFT)

┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│R│█│ │ │█│ │ │ │ │ │ ← Robot at edge
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │█│ │ │ │█│ │ │ │
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
Current 3×3 View:
┌─────┬─────┬─────┐
│ 1.0 │ 1.0 │ 1.0 │ ← All treated as obstacles
├─────┼─────┼─────┤
│ 0.0 │ 0.0 │ 1.0 │ ← Missing wall information
├─────┼─────┼─────┤
│ 0.0 │ 1.0 │ 0.0 │
└─────┴─────┴─────┘

Step 1: Environment Creation

Generate 10×10 grid with random obstacles

┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│R│█│ │ │█│ │ │ │ │ │ ← Random obstacles (█)
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │█│ │ │ │█│ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │ │ │ │ │█│ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │█│ │ │ │ │ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │█│ │ │ │ │█│ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │ │ │█│ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │█│ │ │ │ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │ │ │ │█│ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │ │ │ │ │ │ │ │ │
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘

Start: (0,0) Goal: (9,9)

Step 2: A* Pathfinding

A* algorithm finds optimal path
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│R│█│ │ │█│ │ │ │ │ │ ← R = Robot start
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│↓│ │█│ │ │ │█│ │ │ │ ← ↓ = Path direction
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│↓│ │ │ │ │ │ │█│ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│↓│█│ │ │ │ │ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│↓│ │ │█│ │ │ │ │█│ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│↓│ │ │ │ │█│ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│↓│ │█│ │ │ │ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│↓│ │ │ │ │ │█│ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│→→→→→→→→→G│ ← G = Goal
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘

A* Algorithm Result:

Complete Path: [(0,0) → (0,1) → (1,1) → (2,1) → (3,1) → (4,1) → (5,1) → (6,1) → (7,1) → (8,1) → (8,2) → (8,3) → (8,4) → (8,5) → (8,6) → (8,7) → (8,8) → (9,8) → (9,9)]

Path Length: 18 steps

Path Validity: ✅ (reaches goal)

Optimality: ✅ (shortest possible path)


### Step 3: Extract 3×3 Perceptions


At each step along the A* path:

Position (0,0): Position (1,0):
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐ ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│R│█│ │ │█│ │ │ │ │ │ │ │█│ │ │█│ │ │ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ ├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │ │█│ │ │ │█│ │ │ │ │R│ │█│ │ │ │█│ │ │ │
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ ├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│ │█│ │ │ │ │ │█│ │ │ │ │ │ │ │ │ │ │█│ │ │
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘

3×3 View: 3×3 View:

┌─────┬─────┬─────┐ ┌─────┬─────┬─────┐
│ 0.0 │ 1.0 │ 0.0 │ ← Out of │ 0.0 │ 1.0 │ 0.0 │ ← Out of
├─────┼─────┼─────┤ bounds ├─────┼─────┼─────┤ bounds
│ 0.0 │ 0.0 │ 1.0 │ │ 0.0 │ 0.0 │ 1.0 │
├─────┼─────┼─────┤ ├─────┼─────┼─────┤
│ 0.0 │ 1.0 │ 0.0 │ │ 0.0 │ 0.0 │ 0.0 │
└─────┴─────┴─────┘ └─────┴─────┴─────┘

Flattened: [0,1,0,0,0,1,0,1,0] 
Flattened: [0,1,0,0,0,1,0,0,0]

Action: 1 (DOWN)

## 🔍 Data Structure Details
### Input Data (X_train)
- **Shape**: (841, 9)
- **Type**: float64
- **Values**: 0.0 (empty) to 1.0 (obstacle)
- **Structure**: Each row is a flattened 3×3 perception

  

### Output Data (y_train)

- **Shape**: (841,)

- **Type**: int64

- **Values**: 0-3 (action indices)

- **Mapping**: 0=UP, 1=DOWN, 2=LEFT, 3=RIGHT

  

### Why These Data Types?

  

**Input (float32):**

- **Neural network compatibility**: Requires floating-point inputs

- **Obstacle probability**: Can represent partial visibility (0.0-1.0)

- **Standard practice**: Most ML libraries expect float64

  

**Why float32 anyway?**

- Neural Network Requirement: Most ML libraries expect float64 for inputs

- NumPy Default: np.array() creates float64 by default

- Future Flexibility: Could add partial visibility later (0.5 = partially visible obstacle)

- Library Compatibility: PyTorch/TensorFlow expect float inputs

  

**Output (int8):**

- **Discrete actions**: Only 4 possible values (0,1,2,3)

- **Memory efficient**: Integers use less memory than floats

- **Direct indexing**: Can be used as array indices

  

> **🤔 Why not int8 for 4 discrete actions?**

>

> You're absolutely right! For only 4 values, we could use `int8`:

> ```python

> # Memory comparison for 841 samples:

> int64: 841 × 8 bytes = 6,728 bytes

> int8: 841 × 1 byte = 841 bytes # 8x less memory!

> ```

>

> **Why we use int64 anyway:**

> - NumPy default behavior

> - ML library compatibility

> - Negligible impact for 841 samples (6KB vs 1KB)

> - Easy to optimize later: `y_train.astype(np.int8)`

  

## 🧬 Biological Inspiration

  

**Local Perception**: Like animals using limited peripheral vision to navigate

**Expert Demonstrations**: A* provides optimal "expert" decisions

**Pattern Learning**: Robot learns obstacle-action relationships through repetition

  

The robot learns to map 3×3 obstacle patterns to navigation actions, mimicking how animals use local sensory input to make movement decisions!

  

## 🔧 Neural Network Architecture

Input Layer: 9 neurons (3×3 flattened perception)

Hidden Layer 1: 64 neurons + ReLU + Dropout(0.2)

Hidden Layer 2: 32 neurons + ReLU + Dropout(0.2)

Output Layer: 4 neurons + Softmax


  

**Training Strategy:**

- Split: 80% train, 20% validation

- Batch size: 32-64

- Learning rate: 0.001 with decay

- Epochs: 50-100 with early stopping

  

## 📈 Key Insights

  

1. **Data Balance**: Action distribution is well-balanced (imbalance ratio: 1.25)

2. **Perception Complexity**: Average 2.05 obstacles per 3×3 view

3. **Environment Diversity**: 100 different environments with varying complexity

4. **Optimal Labels**: A* guarantees shortest path decisions

5. **Biological Connection**: Mimics animal navigation with limited vision

  

**Bottom Line:** The robot learns to navigate by observing optimal A* decisions at each position, building a comprehensive map of obstacle patterns to actions through supervised learning.

🤖🧠 Deep mind AI blog series

Explorer

2. Training Data

🎯 Overview

Training data input features

Imitation learning

Is final goal or destination, considered in A* algorithm to find optimal path

What are other alternatives to A* algorithm

QA on training data

Is it require to include initial position, current position, and final goal in training data?

How corner position or robot along the wall will look like in 3×3 perception?

Is training and validation accuracy measured against final goal or its measured against predicted action? if its relative to predicted action, how to make sure if robot reaches its goal?

Step accuracy vs goal success

To find optimal path, is final goal or destination included in A* algorithm?

If A* generated path for NN training include goal information, then do we need to add explicit “goal_delta”: `[0.2, 0.1]` in the NN training data?

Will the robot know the final destination during testing?

Does the robot need wall information in the environment?

📊 Data Structure Summary

🔍 How to Visualize (841, 9) Shape:

🧠 How Training Data is Generated

Step 1: Environment Creation

Step 2: A* Pathfinding

Graph View

Table of Contents

Backlinks

Sample	Feature1 (in_N1)	Feature2 (in_N2)	Feature3 (in_N3)	…	Feature9 (in_N9)	Action
0	0.0	1.0	0.0	…	0.0	1
1	0.0	1.0	0.0	…	0.0	1
2	1.0	0.0	0.0	…	1.0	0
…	…	…	…	…	…	…
840	0.0	0.0	1.0	…	0.0	2

Sample	Feature1 (in_N1)	Feature2 (in_N2)	Feature3 (in_N3)	…	Feature9 (in_N9)	Action
0	0.0	1.0	0.0	…	0.0	1
1	0.0	1.0	0.0	…	0.0	1
2	1.0	0.0	0.0	…	1.0	0
…	…	…	…	…	…	…
840	0.0	0.0	1.0	…	0.0	2

🤖🧠 Deep mind AI blog series

Explorer

2. Training Data

🎯 Overview

Training data input features

Imitation learning

Is final goal or destination, considered in A* algorithm to find optimal path

What are other alternatives to A* algorithm

QA on training data

Is it require to include initial position, current position, and final goal in training data?

How corner position or robot along the wall will look like in 3×3 perception?

Is training and validation accuracy measured against final goal or its measured against predicted action? if its relative to predicted action, how to make sure if robot reaches its goal?

Step accuracy vs goal success

To find optimal path, is final goal or destination included in A* algorithm?

If A* generated path for NN training include goal information, then do we need to add explicit “goal_delta”: [0.2, 0.1] in the NN training data?

Will the robot know the final destination during testing?

Does the robot need wall information in the environment?

📊 Data Structure Summary

🔍 How to Visualize (841, 9) Shape:

🧠 How Training Data is Generated

Step 1: Environment Creation

Step 2: A* Pathfinding

Graph View

Table of Contents

Backlinks

If A* generated path for NN training include goal information, then do we need to add explicit “goal_delta”: `[0.2, 0.1]` in the NN training data?

Sample	Feature1 (in_N1)	Feature2 (in_N2)	Feature3 (in_N3)	…	Feature9 (in_N9)	Action
0	0.0	1.0	0.0	…	0.0	1
1	0.0	1.0	0.0	…	0.0	1
2	1.0	0.0	0.0	…	1.0	0
…	…	…	…	…	…	…
840	0.0	0.0	1.0	…	0.0	2