Enterprise Data Infrastructure

What's Included

Not raw data.
A complete, processed data infrastructure.

Every dataset is cleaned, validated, and enriched with indicators before delivery. We ship what quant teams actually need — not raw ticks.

📈

Multi-Asset Coverage

FX 25 pairs · Crypto 12 symbols · Gold & Silver · Global Indices · Bond ETFs. M1 through D1 timeframes. Up to 26 years of history.

⚡

Pre-Calculated Indicators

Every bar includes MA20/50/100/200, ATR14, RSI14, RSI zone tags, MA slope direction, and session classification. No preprocessing required.

📋

Correlation Event Database

347,000+ correlation breakdown events with pre/post correlation, RSI zone, market phase, and 1d/3d/7d/30d price outcome data. Fully proprietary.

🔍

Validated Signal Events

OOS-validated directional conditions with 65–81% accuracy (n≥192). Phase 8 tested. No look-ahead bias. Each event includes win rate and sample count.

📊

Yield Curve Intelligence

6,556 days of yield spread data (10Y–2Y, 10Y–3M, 30Y–10Y). Inversion detection, 34 historical episodes with start/end dates. FRED-sourced, calculated in-house.

🔒

QC Verified + Documented

Each product includes metadata.json (QC score, missing rate, row count), README.md, and sample Python code. Quality score 99.8+/100 for all delivered packages.

Data Processing

Four levels of processing. Every product starts at Level 2.

We never ship raw price data. The minimum deliverable is indicator-enriched, QC-verified data. Higher tiers include proprietary analytical outputs.

L0

Raw OHLCV

Open, High, Low, Close, Volume only.
Not sold.

L1

Indicator-Enriched

+ MA20/50/100/200
+ ATR14, RSI14
+ RSI Zone tags
+ Session labels

L2

Statistical Analysis

+ Zone return stats
+ Volatility tiers
+ Trend counters
+ Breakout detection

L3

Proprietary Events

+ Correlation breaks
+ ZigZag reversals
+ Signal events
+ OOS validation

L0 raw data is not redistributed. All products are Level 1 minimum — transformed, enhanced analytical datasets.

Product Catalog

Available datasets — all QC verified

Below is a structured view of what's available. Custom combinations and time ranges available via estimate request.

Product	Type	Assets	Period	Processing Level	Formats	Status
Correlation Break Events	Event database	FX 35 pairs	2000–2026	L3 Proprietary	CSV + JSON	READY
Phase 8 Validated Conditions	Research dataset	FX major	OOS tested	L3 Proprietary	CSV + Report	READY
ZigZag Reversal Dataset	Event database	XAUUSD M15	2015–2026	L3 Proprietary	CSV	READY
Correlation Matrix Historical	Matrix database	FX 35×35	2000–2026	L3 Proprietary	CSV + Parquet	READY
Yield Curve Intelligence	Macro dataset	US Treasury	2000–2026	L3 Calculated	CSV + JSON	READY
MA Cross Signal Pack	Signal events	7 major symbols	H1/H4/D1	L2 Enhanced	CSV	READY
RSI Strategy Dataset	Indicator + stats	FX / Crypto	2000–2026	L2 Statistical	CSV + Parquet	READY
ATR Volatility Pack	Volatility dataset	FX / Crypto / Gold	2000–2026	L2 Statistical	CSV + Parquet	READY
Session Analysis Pack	Session dataset	FX 25 pairs	2000–2026	L2 Statistical	CSV	READY
BTC / ETH M1 Indicator Pack	Indicator-enriched	BTCUSD, ETHUSD	2017–2026	L1 Enriched	CSV + Parquet	QC 99.8
Crypto 12-Pack M1	Indicator-enriched	12 symbols	2017–2026	L1 Enriched	CSV + Parquet	QC 99.8+
Crisis Period Analysis Pack	Research report	All major	Crisis periods	L3 Analytical	CSV + PDF	On Request
Custom / Enterprise Pack	Custom	Any combination	Any range	Any Level	Any format	Quote Required

Sample Code

Works right out of the box

Every dataset ships with documented columns, metadata.json, and tested sample code. Paste and run.

        
        # Load any MARKETDPRO indicator-enriched dataset
import pandas as pd

# Option A: Load single year (fast)
df = pd.read_parquet("BTCUSD/M1/2024.parquet")

# Option B: Load all years at once
df = pd.read_parquet("BTCUSD/M1/")  # loads all *.parquet files

# Option C: Load from CSV
df = pd.read_csv("BTCUSD/M1/BTCUSD_M1_full.csv", parse_dates=["time"])
df = df.set_index("time")

# Check what's included — columns are pre-calculated
print(df.columns.tolist())
# ['open', 'high', 'low', 'close', 'volume',
#  'ma20', 'ma50', 'ma100', 'ma200',
#  'atr14', 'rsi14', 'rsi14_zone']

# Check data range and quality
print(f"Range: {df.index.min()} to {df.index.max()}")
print(f"Total bars: {len(df):,}")
print(df.describe())

      

        
        # MA Cross Strategy — using pre-calculated MA columns
import pandas as pd

df = pd.read_parquet("EURUSD/M1/")

# MA columns are already in the dataset — no recalculation needed
# Generate signals from EMA12/26 cross (calculate from MA20/50)
df['ema12'] = df['close'].ewm(span=12).mean()
df['ema26'] = df['close'].ewm(span=26).mean()

# Golden cross (EMA12 crosses above EMA26)
df['golden_cross'] = (
    (df['ema12'] > df['ema26']) &
    (df['ema12'].shift(1) <= df['ema26'].shift(1))
)

# Filter: only crosses where RSI is in 'bullish' zone (RSI 50-70)
golden_filtered = df[
    df['golden_cross'] &
    (df['rsi14_zone'] == 'bullish')
]

# Calculate 24h forward return for each signal
df['return_24h'] = df['close'].pct_change(1440).shift(-1440)
signal_returns = df.loc[golden_filtered.index, 'return_24h']

print(f"Total signals: {len(golden_filtered)}")
print(f"Win rate: {(signal_returns > 0).mean():.1%}")
print(f"Avg return: {signal_returns.mean():.4%}")

      

        
        # RSI Zone Analysis — zone-based return statistics
import pandas as pd

df = pd.read_parquet("XAUUSD/M1/")  # Gold M1 — 25 years

# RSI zone column is pre-tagged in every bar
# Values: 'overbought' | 'bullish' | 'bearish' | 'oversold'

# Forward return (next 24 bars = ~24 minutes for M1)
df['fwd_return'] = df['close'].pct_change(24).shift(-24)

# Group by RSI zone and calculate stats
zone_stats = df.groupby('rsi14_zone')['fwd_return'].agg([
    'count',
    'mean',
    ('win_rate', lambda x: (x > 0).mean()),
    ('avg_win', lambda x: x[x > 0].mean()),
    ('avg_loss', lambda x: x[x < 0].mean()),
])

zone_stats['mean'] = (zone_stats['mean'] * 100).round(4)
zone_stats['win_rate'] = (zone_stats['win_rate'] * 100).round(2)
print(zone_stats.to_string())

      

        
        # Correlation Break Events — proprietary dataset
import pandas as pd

# Load the correlation event database (347K+ events)
events = pd.read_csv("correlation_break_events_2000_2026.csv", parse_dates=["event_date"])

# Columns: event_date, symbol, pair, corr_before, corr_after,
#          rsi14, rsi_zone, phase, direction,
#          return_1d, return_3d, return_7d, return_30d

# Example: Events where RSI is in oversold zone
oversold_events = events[events['rsi_zone'] == 'oversold']

# Calculate directional accuracy at 7-day horizon
correct = (
    (oversold_events['direction'] == 'up') &
    (oversold_events['return_7d'] > 0)
) | (
    (oversold_events['direction'] == 'down') &
    (oversold_events['return_7d'] < 0)
)

print(f"Oversold zone events: {len(oversold_events):,}")
print(f"7-day directional accuracy: {correct.mean():.1%}")
print(f"Average 7d return: {oversold_events['return_7d'].mean():.4%}")

      

        
        # Yield Curve Intelligence — inversion analysis
import pandas as pd

# Load yield curve data (6,556 days, 2000–2026)
yc = pd.read_csv("yield_curve_2000_2026.csv", parse_dates=["date"])

# Columns: date, dgs10, dgs2, dgs3mo, dgs30,
#          spread_10y2y, spread_10y3m, spread_30y10y,
#          curve_regime, inversion_days, event_type
# curve_regime: 'normal' | 'flat' | 'inverted'

# Count inversion episodes
inversions = yc[yc['curve_regime'] == 'inverted']
print(f"Total inversion days: {len(inversions)}")

# Show all 34 inversion episodes
episodes = yc[yc['event_type'] == 'INVERSION_START'][['date', 'spread_10y2y']]
print(episodes.to_string())

# Backtest: market behavior after inversion end
yc['next_regime'] = yc['curve_regime'].shift(-60)  # 60 days forward

      

Use Cases

What our clients build with this data

From individual researchers to SaaS teams, here's what the data enables.

Case 01

EA Backtesting System

Use indicator-enriched M1 data to build and validate Expert Advisors. Pre-calculated ATR14 and MA columns reduce backtesting time by 80%. Parquet format loads 5M bars in under 3 seconds.

→ Faster strategy validation

Case 02

Signal SaaS Product

License the Correlation Break Events dataset as the intelligence layer for your own signal delivery product. 347K+ validated events with outcomes — the research is done. Focus on the product.

→ Launch in weeks, not months

Case 03

ML Model Training

13 years of BTC M1 data with RSI zones, ATR tiers, and session labels — structured feature columns ready for gradient boosting or neural network training. No feature engineering required.

→ Ready-to-train feature matrix

Case 04

Quant Research

Phase 8 validated conditions: 65–81% OOS accuracy, n≥192. No look-ahead bias. Used for academic-grade strategy research, paper trading validation, and systematic signal development.

→ Publishable-quality dataset

Case 05

Macro Research

26 years of yield curve data with inversion events, spread calculations, and regime classifications. Overlay with FX/equity performance for macro-driven strategy development.

→ Full macro cycle coverage

Case 06

White-label SaaS

Purchase the Institutional Package + Annual API License. Get historical data + live updates + custom data structures. Build your own branded market intelligence product on top of our infrastructure.

→ Your brand, our infrastructure

Pricing

Three tiers. One infrastructure.

All packages include metadata.json, README, sample code, and verified QC reports. Delivered via secure download or API endpoint.

Starter Data Package

$9,800/one-time

Individual researchers & developers

3 years of data per symbol
Choose up to 5 symbols
FX major pairs or Crypto
L1 Indicator-enriched (MA, ATR, RSI, Zone tags)
CSV + Parquet delivery
metadata.json + README per dataset
Sample Python code included
Email support (48h response)

Request Estimate →

How this data is classified

We take data classification seriously. Here is exactly how each product type is categorized and what you can do with it.

Data Classification

All products are processed, transformed, and enhanced analytical datasets. Raw price data (OHLCV) is collected from institutional-grade sources for internal processing only and is never redistributed as-is.
Level 1 products contain derived indicator data (MA, ATR, RSI) calculated by MARKETDPRO's internal processing pipeline. These are derived works, not raw price data.
Level 2 products contain statistical summaries, zone classifications, and volatility tiers derived from historical prices. These are original analytical outputs.
Level 3 products (correlation events, signal events, ZigZag reversals, OOS conditions) are fully proprietary outputs of MARKETDPRO's engines. They do not contain raw OHLCV data.
Yield Curve data is derived from FRED (Federal Reserve Economic Data), a public domain US government source, and enhanced with MARKETDPRO's own calculations (spread, regime, inversion detection).

Permitted Uses

Backtesting and strategy development for internal use
Machine learning model training and research
Building SaaS products or signal services using the derived outputs
Academic and institutional research
Integration into proprietary trading systems

Prohibited Uses

Redistribution of raw price data (L0) — not applicable as this is never included
Re-selling datasets without modification as a competing data product
Using as investment advice or financial recommendations
Sharing datasets with third parties outside your organization without a separate license

For detailed licensing terms, white-label agreements, or exclusive usage rights, contact contact@marketdpro.com. This data is for research and analytical purposes. Nothing here constitutes investment advice.

Ready to Build?

Tell us what you need — market, timeframe, period, and purpose. We'll put together a custom estimate within 24 hours.

📋 Request Custom Estimate Browse Full Catalog

Own the Market Intelligence Infrastructure.

Not raw data.A complete, processed data infrastructure.