COMPLETE GUIDE

How to Use the Data

From opening a CSV in Excel for the first time, to running a machine learning model on 330 million bars. Step by step.

All examples are for research and educational purposes only. Not financial advice.

Beginner Intermediate Advanced Ideas Gallery
🅣 BEGINNER — No coding required

Start Here: Open the Data in Excel

You just downloaded a dataset. What now? The simplest thing you can do is open the CSV file in Excel or Google Sheets and start exploring. No Python, no setup, no code.

1

Download a dataset from the catalog

Go to the Data Catalog, pick any D1 dataset (daily bars are smallest and easiest to start with). For example: XAUUSD D1 (Gold, daily, from 2003).

2

Open the CSV file

Locate the XAUUSD_D1_full.csv file. Double-click to open in Excel or drag it into Google Sheets. You'll see columns: time, open, high, low, close, volume, ma20, ma50, rsi14…

3

Find the 2008 financial crisis

Filter the time column for September 2008. Gold surged from $780 to $900 in just 3 weeks as markets collapsed. This is visible right there in your spreadsheet — no tool needed.

4

Try making a chart

Select the close column and create a line chart. Add the ma50 column as a second line. You just built a moving average chart of Gold for 20+ years. For research only.

What you just did
You explored 20 years of gold price history. You saw how prices behaved during real historical crises. You used moving averages as reference. This is the same data professional researchers use. Not financial advice — research only.

All information provided by MARKETDPRO is for research purposes only and does not constitute financial advice.


⚙️ INTERMEDIATE — Python basics helpful

Your First Backtest in Python

Backtesting means: "If I had followed this rule historically, what would have happened?" It is a research technique — not a prediction tool, and not financial advice. Here's how to run one in under 20 lines of Python.

Setup (one-time, 2 minutes)

# Install required libraries (run once in terminal)
pip install pandas pyarrow matplotlib

Load the data

import pandas as pd

# Load EURUSD D1 Parquet (faster than CSV for large files)
df = pd.read_parquet('EURUSD_D1.parquet')
df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')

# MA20 and MA50 are already in the file — no calculation needed
print(df[['close', 'ma20', 'ma50', 'rsi14']].tail())

Simple MA crossover research (for study only)

# Research question: when ma20 is above ma50, what happens next?
# This is a research exercise — not a trading recommendation

df['signal'] = (df['ma20'] > df['ma50']).astype(int)
df['next_ret'] = df['close'].pct_change().shift(-1)

mean_up = df[df['signal'] == 1]['next_ret'].mean()
mean_down = df[df['signal'] == 0]['next_ret'].mean()

print(f"When MA20 > MA50: avg next-day return = {mean_up:.4f}")
print(f"When MA20 < MA50: avg next-day return = {mean_down:.4f}")
# Results are historical patterns only. Not predictions. Not advice.

Plot it

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(14, 5))
df['close'].plot(ax=ax, color='#ccc', linewidth=0.8, label='Close')
df['ma20'].plot(ax=ax, color='#4da6ff', linewidth=1.2, label='MA20')
df['ma50'].plot(ax=ax, color='#f5a623', linewidth=1.2, label='MA50')
ax.legend(); ax.set_title('EURUSD D1 — Research Only')
plt.tight_layout(); plt.show()
What you just did
You ran a 20-year historical pattern study on EURUSD. The MA20/MA50 data was already there — you wrote zero calculation code. You visualized it. This is quantitative research. For research purposes only — not financial advice.

Past statistical patterns do not predict future results. All examples are for educational research only.


🚀 ADVANCED — Python + data engineering

Multi-Asset Analysis with DuckDB

When you have 40 FX pairs and 20 years of M1 data, CSV and pandas start to slow down. DuckDB lets you run SQL directly on Parquet files at full speed — no database server needed.

Query all FX pairs at once

import duckdb

# Read all Parquet files in the fx/ folder — no loading into memory
result = duckdb.query("""
  SELECT
    symbol,
    strftime(time, '%Y-%m') AS month,
    AVG(rsi14) AS avg_rsi,
    AVG(atr14 / close) AS avg_atr_pct
  FROM read_parquet('fx/**/*.parquet')
  WHERE rsi14 IS NOT NULL
  GROUP BY 1, 2
  ORDER BY 1, 2
"""
).df()
# 40 pairs × 20 years × M1. Done in seconds. For research only.

Cross-asset correlation study

import pandas as pd

# Load daily returns for multiple assets (research purpose)
assets = {'XAUUSD': 'gold/D1.parquet',
          'USDJPY': 'fx/USDJPY/D1.parquet',
          'SPY':   'stocks/SPY/D1.parquet'}

rets = pd.DataFrame({
  k: pd.read_parquet(v)['close'].pct_change()
  for k, v in assets.items()
})

print(rets.corr()) # Correlation matrix — research only

Feature engineering for ML (research example)

import polars as pl

df = pl.read_parquet('fx/EURUSD/M15/*.parquet')

# Build features — all indicators already present
df = df.with_columns([
  (pl.col('close') > pl.col('ma50')).alias('above_ma50'),
  (pl.col('rsi14') < 30).alias('oversold'),
  pl.col('atr14').rolling_mean(20).alias('avg_vol'),
])
# Feed into sklearn, XGBoost, PyTorch — for research only
What you can build
Feature engineering pipelines. Cross-asset correlation matrices. Risk-off / risk-on regime detectors. Volatility forecasting models. All with data that is already cleaned and indicator-enriched. For research purposes only — not financial advice.

Machine learning models trained on historical data may not generalize to future market conditions. All examples are for research only.


🤔 IDEAS GALLERY

What Could You Build?

These are research ideas from the MARKETDPRO community. None of these are financial advice — they are starting points for your own exploration.

📅 Crisis Playbook

Compare how Gold, JPY, and USD behaved during 2008, COVID, and SVB bank run. Build a personal "crisis pattern" reference for research.

🌞 Session Volatility Map

Use M1 data to measure average volatility by hour of day for each FX pair. Find when each pair is most active. Research only.

📈 RSI Extremes Study

What happens after RSI drops below 20 on BTCUSD? Study all occurrences in the data. Discover historical patterns for research.

🌎 Yield Curve Signal Research

Combine Bond macro data with Gold D1. Study whether yield curve inversions correlate with Gold movements. Historical analysis only.

⚡ Crypto Halving Analysis

Mark BTC halving dates on the chart. Measure price behavior in the 6 months before and after each event. Research purposes only.

👨‍💻 Build Your Own Dashboard

Load multiple D1 datasets into a Grafana or Streamlit dashboard. Create a personal market overview tool for research and learning.

🏆 Strategy Parameter Study

Test the same MA crossover rule across 40 FX pairs. Which pair shows the strongest historical patterns? Backtest research only.

💪 Portfolio Diversification Research

Calculate rolling correlations between Stocks, Gold, and Crypto. Study when correlations break down (crises). For research use.

Ready to start your own research?

Browse 94+ datasets. Preview free samples. Buy only what you need.

Open Data Catalog →

For research and educational purposes only · Not financial advice