Becoming a Researcher¶

Inspired by "Spinning Up as a Deep RL Researcher" from OpenAI.

This page offers advice for early-stage PhD students on how to develop research taste, execute projects, and contribute to the embodied AI community. The advice is opinionated — take what resonates and adapt it to your situation.

Building the Right Background¶

Core Skills¶

Skill	Why It Matters	How to Build It
Math fluency	RL theory requires probability, optimization, linear algebra	Work through derivations by hand, not just reading
Implementation ability	Research ideas need working code to validate	Implement papers from scratch (don't just use libraries)
Experiment design	Bad experiments waste months	Study experimental methodology, learn proper baselines
Writing	Papers must communicate clearly	Read top papers for style, write early and often
Critical reading	Need to evaluate others' claims	Practice reading papers critically — what's the real contribution?

The Implementation Imperative¶

Implement key algorithms yourself. This is the single most important thing you can do as a new RL researcher. Using a library's PPO implementation does not build understanding.

Suggested progression:

REINFORCE — get the policy gradient working on CartPole
DQN — implement experience replay, target networks on Atari Pong
PPO — full implementation with GAE, clipping, value function
SAC — off-policy, continuous control, entropy tuning

Each implementation will surface misunderstandings that reading alone cannot reveal.

Developing Research Taste¶

Read Broadly, Then Deeply¶

Phase 1: Survey (first few months)

Read 2-3 papers per day (abstracts + skim)
Cover all major subareas in your field
Build a mental map of the research landscape
Don't try to understand everything deeply — build breadth first

Phase 2: Deep dive (once you've found your area)

Read seminal papers deeply — line by line, derivation by derivation
Implement the key methods
Understand why design choices were made
Read the related work sections to find connections

Ask the Right Questions¶

Good research starts with good questions. When reading a paper, ask:

What assumption does this make that might not hold?
What would break if we changed X?
Why didn't they try Y?
What's the simplest possible version of this that would still work?
Where does this fail, and can failures be characterized?

Find Your Niche¶

The best research comes from unique combinations of expertise:

RL + control theory → better policy optimization
World models + computer vision → better representation learning
Distributed systems + RL → scalable training
Robotics + RL + human data → practical embodied AI

Find the intersection that excites you and where you have (or can build) an advantage.

Executing a Research Project¶

The Research Cycle¶

graph LR
    I[Idea] --> H[Hypothesis]
    H --> E[Experiment]
    E --> A[Analysis]
    A --> I2[Refined Idea]
    I2 --> H

Step 1: Idea Generation¶

Sources of ideas:

Failures and limitations of existing methods (read "Limitations" sections of papers)
Combining ideas from different subfields
Scaling up or down existing approaches
Ablation studies that reveal what actually matters
Real-world deployment challenges (sim-to-real gap, safety, latency)

Step 2: Feasibility Check¶

Before committing months to an idea:

Can you state the hypothesis in one sentence?
What's the simplest experiment that would test it?
What baseline would you compare against?
What result would convince you to stop (positive or negative)?
Is this achievable with your compute budget?

Step 3: Experimental Design¶

Principles:

Start small: CartPole/simple envs first, then scale to harder problems
One change at a time: Don't modify three things simultaneously
Strong baselines: Compare against well-tuned baselines, not strawmen
Ablations: Show which components of your method matter
Multiple seeds: Always run 3-5+ random seeds

Step 4: Analysis and Writing¶

Be honest: Report failures and negative results — they're informative
Understand your results: Don't just report numbers — explain why things work or don't
Write as you go: Don't wait until experiments are done to start writing

Practical Advice¶

Managing Compute¶

Track all experiments in a logging system (W&B, TensorBoard)
Use config files (Hydra, YAML) — never hardcode hyperparameters
Write reproducible experiment scripts
Save checkpoints regularly
Learn to profile code before scaling up

Collaborating¶

Use git properly (branches, meaningful commits)
Document your code — your future self will thank you
Share negative results with your lab — prevents duplicated effort
Attend reading groups and present papers

Staying Current¶

Follow key researchers on Twitter/X and Google Scholar
Set up arXiv alerts for relevant keywords
Attend 1-2 conferences per year (NeurIPS, ICML, ICLR, CoRL, RSS, ICRA)
Engage with the open-source community (GitHub issues, discussions)

Mental Health¶

Research is a marathon, not a sprint:

Negative results are normal and expected
Comparison with others' highlight reels is misleading
Take breaks — some of the best ideas come when you're not at your desk
Build a support network of peers and mentors