🎯 Overview

This document explains the training data structure for the 2D Point-Robot Navigator project, where a robot learns to navigate using only a 3Γ—3 perception window in 10Γ—10 environments.

Robot environment
β”Œβ”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”
β”‚Rβ”‚β–ˆβ”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ ← Random obstacles (β–ˆ)
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β””β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”˜

Robot local view:
β”Œβ”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”
β”‚   β”‚ β–ˆ β”‚   β”‚
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚ β–ˆ β”‚ R β”‚   β”‚  Should it go RIGHT or DOWN? => DOWN
β”œβ”€β”€β”€β”Όβ”€β”€β”€β”Όβ”€β”€β”€β”€
β”‚   β”‚   β”‚   β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”΄β”€β”€β”€β”˜

What is the training data exactly? 3x3 percption and action tag β‡’ How does it work?

(state) = (local_view, goal_delta)
(label) = expert action (from A*)
  • 9 features: Flattened 3Γ—3 perception grid + goal_delta:
  • ==4 actions: UP(0), DOWN(1), LEFT(2), RIGHT(3)==

Training data input features

Imitation learning

Training an agent to mimic an expert’s behavior instead of discovering it purely by trial and error. You give the agent examples of state β†’ expert action, then it learns to copy those actions based on local view.

A* explicitly uses the goal to guide search (through the heuristic h) and to decide when to stop (when the goal node is reached).

Example with A:*

  1. Generate a 10Γ—10 grid with obstacles.
  2. Use A* (expert planner) to find the shortest path from start β†’ goal.
  3. At each step, record:
    • State (3Γ—3 local patch + relative goal vector)
    • Expert action (the next move chosen by A*)
  4. Train a neural network on these pairs.
    Result: the network learns to navigate like A*, but only using its local 3Γ—3 view and goal info.

Is final goal or destination,Β  considered in A* algorithm to find optimal path

A* explicitly uses the goal to guide search (through the heuristic h) and to decide when to stop (when the goal node is reached). Heuristic function : - To evaluate each candidate node n, A* uses a heuristic that depends on the goal position. - Example (distance in a grid): - Without the goal, you cannot compute h(n).

What are other alternatives to A* algorithm

AspectA*****Imitation Learning (NN)Reinforcement Learning (RL)
Needs full map?βœ… Yes❌ No (local perception + goal)❌ No (just rewards + perception)
Optimalityβœ… Guaranteed❌ Approximate❌ Approximate
Data need❌ Noneβœ… Needs expert demos (A*)βœ… Needs reward signals & lots of episodes
Computation❌ Slow searchβœ… Fast (forward pass)❌ Slow to train, fast at test
Generalization❌ Must re-runβœ… Generalizes across mapsβœ… Can adapt to new maps via training
Interpretabilityβœ… Clear❌ Black-box❌ Black-box
Works in partial observability❌ Noβœ… Yesβœ… Yes
ApplicationsRouting, games, static mapsRobots, cars, dronesComplex robotics, unknown envs
βœ… Summary:
  • A* = optimal, but unrealistic in partial observability.
  • Imitation learning = fast, practical, learns A* like policy from local views.
  • RL = flexible, works without expert, but training is costly.
  • In practice, often A (global) + NN (local) is used in real robots.

QA on training data

Is it require to include initial position, current position, and final goal in training data?

  • Initial position: ❌ Not required as an explicit feature.
    • Because at each training step the agent’s local perception is always centered on itself, so the absolute start doesn’t matter.

    How corner position or robot along the wall will look like in 3Γ—3 perception?

    Robot in theΒ top-left corner
    [1, 1, 1]
    [1, 0, 0]
    [1, 0, 0]
    
    • Corners β†’ two sides of the patch (rows/columns) padded with 1’s.
    • Walls β†’ one side padded with 1’s.
  • Current position: ❌ Not needed explicitly either.
    • The 3Γ—3 patch (with agent at the center) already encodes where it is relative to local obstacles.
  • Final position (goal coordinates): βœ… You need to encode the goal information (relative vector dx, dy) at every step.
    • Without this, the network wouldn’t know which direction to head.
    • goal_delta:
    • action: expert’s next move from A*

Is training and validation accuracy measured against final goal or its measured against predicted action? if its relative to predicted action, how to make sure if robot reaches its goal?

Step accuracy vs goal success

In Imitation Learning (supervised)

  • Training/validation accuracy is measured against the predicted action vs. expert action (from A*).
  • Example: if A* says β€œmove East” and your network predicts β€œEast,” that’s a correct classification.
  • So accuracy = % of steps where predicted action = expert action. This is like image classification: input (3Γ—3 patch + goal delta) β†’ label (expert’s next move). πŸ‘‰ Here, you don’t directly measure β€œdid the robot reach the final goal” during training. That comes later in evaluation.

To find optimal path, is final goal or destination included in A* algorithm?

Yes β€” A* explicitly uses the goal to guide search (through the heuristic h) and to decide when to stop (when the goal node is reached).

If A* generated path for NN training include goal information, then do we need to add explicit Β  β€œgoal_delta”: [0.2, 0.1] in the NN training data?

Even though the A* path β€œknows” the goal, the neural network doesn’t see the full map β€” it only gets:

  • Local 3Γ—3 perception
  • Relative direction to the goal (goal_delta) Without explicitly giving the network the goal information, the same 3Γ—3 local view could mean different correct actions depending on which goal you’re heading toward.
(state) = (local_view, goal_delta)
(label) = expert action (from A*)

Will the robot know the final destination during testing?

Yes β€” you must tell it the final goal at the start of each episode.

  • During testing, you don’t reveal the path, but you provide the goal coordinates.
  • The robot then computes goal_delta at every step.
  • Without this, the robot would only wander locally because it wouldn’t know which direction to go.

Does the robot needΒ wall information in the environment?

Yes β€” the robot must perceive walls (map boundaries) inside its 3Γ—3 patch, otherwise it won’t know it’s at the edge and might try to walk out of bounds.

  • *Include wall information in the 3Γ—3 patch, by padding out-of-bounds cells with β€œwall = 1” in the obstacle channel.


πŸ“Š Data Structure Summary

INPUT (X_train): OUTPUT (y_train):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Type: float32 (OPTIMIZED!) β”‚ β”‚ Type: int8 β”‚
β”‚ Shape: (841, 9) β”‚ β”‚ Shape: (841,)β”‚ For each sample there is corrsponding one action by Robot.
β”‚ Values: 0.0 to 1.0 β”‚ β”‚ Values: 0-3 β”‚
β”‚ Meaning: 3Γ—3 obstacle patterns β”‚ β”‚ Meaning: Actions β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

X_train shape: (841, 9)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Sample 0: [0,1,0,0,0,1,0,1,0] β”‚ ← 9 features per sample 3Γ—3 View along A* path
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Sample 1: [0,1,0,0,0,1,0,0,0] β”‚ ← 9 features per sample 3Γ—3 View along A* path
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Sample 2: [1,0,0,1,0,0,0,0,1] β”‚ ← 9 features per sample 3Γ—3 View along A* path
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ ... β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Sample 840: [0,0,1,0,1,0,0,0,0] β”‚ ← 9 features per sample 3Γ—3 View along A* path
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ” How to Visualize (841, 9) Shape:

Think of it as a Table in_N β†’ Input Neuron

SampleFeature1 (in_N1)Feature2 (in_N2)Feature3 (in_N3)…Feature9 (in_N9)Action
00.01.00.0…0.01
10.01.00.0…0.01
21.00.00.0…1.00
…………………
8400.00.01.0…0.02

Key Numbers:

  • 841 samples: Total training examples
  • 9 features: Flattened 3Γ—3 perception grid
  • 4 actions: UP(0), DOWN(1), LEFT(2), RIGHT(3)
  • 100 environments: Diverse training scenarios

🧠 How Training Data is Generated

Input:  [0, 1, 0, 1, 1, 0, 0, 1, 0,   # 3x3 perception (9 features)
         0, 0, 0, 1,                   # Last action: UP (one-hot)
         0, 1, 0, 0,                   # 2nd last action: DOWN (one-hot) 
         1, 0, 0, 0]                   # 3rd last action: LEFT (one-hot)
Output: [2]                           # Action (LEFT)

β”Œβ”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”
β”‚Rβ”‚β–ˆβ”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ ← Robot at edge
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚
β””β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”˜
Current 3Γ—3 View:
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ 1.0 β”‚ 1.0 β”‚ 1.0 β”‚ ← All treated as obstacles
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ 0.0 β”‚ 0.0 β”‚ 1.0 β”‚ ← Missing wall information
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ 0.0 β”‚ 1.0 β”‚ 0.0 β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

Step 1: Environment Creation

Generate 10Γ—10 grid with random obstacles

β”Œβ”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”
β”‚Rβ”‚β–ˆβ”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ ← Random obstacles (β–ˆ)
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β””β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”˜

Start: (0,0) Goal: (9,9)

Step 2: A* Pathfinding

A* algorithm finds optimal path
β”Œβ”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”
β”‚Rβ”‚β–ˆβ”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ ← R = Robot start
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
│↓│ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ ← ↓ = Path direction
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
│↓│ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚β†“β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
│↓│ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
│↓│ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
│↓│ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
│↓│ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚β†’β†’β†’β†’β†’β†’β†’β†’β†’Gβ”‚ ← G = Goal
β””β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”˜

A* Algorithm Result:

Complete Path: [(0,0) β†’ (0,1) β†’ (1,1) β†’ (2,1) β†’ (3,1) β†’ (4,1) β†’ (5,1) β†’ (6,1) β†’ (7,1) β†’ (8,1) β†’ (8,2) β†’ (8,3) β†’ (8,4) β†’ (8,5) β†’ (8,6) β†’ (8,7) β†’ (8,8) β†’ (9,8) β†’ (9,9)]

Path Length: 18 steps

Path Validity: βœ… (reaches goal)

Optimality: βœ… (shortest possible path)


### Step 3: Extract 3Γ—3 Perceptions


At each step along the A* path:

Position (0,0): Position (1,0):
β”Œβ”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β” β”Œβ”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”¬β”€β”
β”‚Rβ”‚β–ˆβ”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€ β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚Rβ”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚
β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€ β”œβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”Όβ”€β”€
β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚β–ˆβ”‚ β”‚ β”‚
β””β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”˜ β””β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”΄β”€β”˜

3Γ—3 View: 3Γ—3 View:

β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ 0.0 β”‚ 1.0 β”‚ 0.0 β”‚ ← Out of β”‚ 0.0 β”‚ 1.0 β”‚ 0.0 β”‚ ← Out of
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€ bounds β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€ bounds
β”‚ 0.0 β”‚ 0.0 β”‚ 1.0 β”‚ β”‚ 0.0 β”‚ 0.0 β”‚ 1.0 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ 0.0 β”‚ 1.0 β”‚ 0.0 β”‚ β”‚ 0.0 β”‚ 0.0 β”‚ 0.0 β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

Flattened: [0,1,0,0,0,1,0,1,0] 
Flattened: [0,1,0,0,0,1,0,0,0]

Action: 1 (DOWN)

## πŸ” Data Structure Details
### Input Data (X_train)
- **Shape**: (841, 9)
- **Type**: float64
- **Values**: 0.0 (empty) to 1.0 (obstacle)
- **Structure**: Each row is a flattened 3Γ—3 perception

  

### Output Data (y_train)

- **Shape**: (841,)

- **Type**: int64

- **Values**: 0-3 (action indices)

- **Mapping**: 0=UP, 1=DOWN, 2=LEFT, 3=RIGHT

  

### Why These Data Types?

  

**Input (float32):**

- **Neural network compatibility**: Requires floating-point inputs

- **Obstacle probability**: Can represent partial visibility (0.0-1.0)

- **Standard practice**: Most ML libraries expect float64

  

**Why float32 anyway?**

- Neural Network Requirement: Most ML libraries expect float64 for inputs

- NumPy Default: np.array() creates float64 by default

- Future Flexibility: Could add partial visibility later (0.5 = partially visible obstacle)

- Library Compatibility: PyTorch/TensorFlow expect float inputs

  

**Output (int8):**

- **Discrete actions**: Only 4 possible values (0,1,2,3)

- **Memory efficient**: Integers use less memory than floats

- **Direct indexing**: Can be used as array indices

  

> **πŸ€” Why not int8 for 4 discrete actions?**

>

> You're absolutely right! For only 4 values, we could use `int8`:

> ```python

> # Memory comparison for 841 samples:

> int64: 841 Γ— 8 bytes = 6,728 bytes

> int8: 841 Γ— 1 byte = 841 bytes # 8x less memory!

> ```

>

> **Why we use int64 anyway:**

> - NumPy default behavior

> - ML library compatibility

> - Negligible impact for 841 samples (6KB vs 1KB)

> - Easy to optimize later: `y_train.astype(np.int8)`

  

## 🧬 Biological Inspiration

  

**Local Perception**: Like animals using limited peripheral vision to navigate

**Expert Demonstrations**: A* provides optimal "expert" decisions

**Pattern Learning**: Robot learns obstacle-action relationships through repetition

  

The robot learns to map 3Γ—3 obstacle patterns to navigation actions, mimicking how animals use local sensory input to make movement decisions!

  

## πŸ”§ Neural Network Architecture

  

Input Layer: 9 neurons (3Γ—3 flattened perception)

Hidden Layer 1: 64 neurons + ReLU + Dropout(0.2)

Hidden Layer 2: 32 neurons + ReLU + Dropout(0.2)

Output Layer: 4 neurons + Softmax


  

**Training Strategy:**

- Split: 80% train, 20% validation

- Batch size: 32-64

- Learning rate: 0.001 with decay

- Epochs: 50-100 with early stopping

  

## πŸ“ˆ Key Insights

  

1. **Data Balance**: Action distribution is well-balanced (imbalance ratio: 1.25)

2. **Perception Complexity**: Average 2.05 obstacles per 3Γ—3 view

3. **Environment Diversity**: 100 different environments with varying complexity

4. **Optimal Labels**: A* guarantees shortest path decisions

5. **Biological Connection**: Mimics animal navigation with limited vision

  

**Bottom Line:** The robot learns to navigate by observing optimal A* decisions at each position, building a comprehensive map of obstacle patterns to actions through supervised learning.