I help manufacturing, pharma, defense, and finance companies make better decisions — through rigorous forecasting, statistical analysis, and production-grade ML systems. PhD in Mathematics.
I am a passionate Data Scientist and Systems Engineer with a deep-rooted fascination for High-Performance Computing (HPC) and rigorous statistical methodology.
My journey began with a strong foundation in statistics, leading to a Ph.D. where I explored complex data structures and inference models. However, I quickly realized that theoretical models are only as good as the systems that run them. This drove me to bridge the gap between abstract mathematics and bare-metal performance.
Today, I specialize in dismantling slow, legacy data pipelines (often written in Python or R) and rebuilding them using systems-level languages like Rust and C++, combined with modern analytical engines like DuckDB. My goal is to create data architectures that are not just accurate, but blisteringly fast and infinitely scalable.
Forecasts are unreliable at scale
I build forecasting systems that learn and improve automatically — already proven across 500k+ products, reducing errors by 25–35% and enabling proactive planning.
Data quality issues erode trust
I create automated monitoring that catches anomalies early and delivers clear reports — so your team can trust the numbers behind every decision.
Good analysis never reaches production
I turn prototypes into reliable, fast systems your team can depend on daily — not just one-off analyses that sit in someone's laptop.
Regulated industries demand proof
I deliver validated statistical methods with full audit trails that satisfy regulators — so your quality and compliance teams can sleep at night.
Every engagement follows a clear, structured process — so you always know what to expect.
I listen first. We define the problem, review your data landscape, and agree on what success looks like.
I explore the data, prototype statistical approaches, and present a clear technical plan — no black boxes.
I build production-grade systems — tested, documented, and deployed to your infrastructure with CI/CD.
Your team gets full ownership — training, documentation, and ongoing support to keep everything running.
Turning complex data into defensible decisions — Bayesian inference, Functional Data Analysis, and ML methods grounded in a PhD in Mathematics.
Reducing forecast error and enabling proactive planning — hierarchical demand models, causal inference, and automated retraining deployed at scale.
Shipping statistical methods as reliable, fast software — production Rust, C++, and Python systems that teams can actually depend on.
Getting models out of notebooks and into production — automated pipelines, containerization, and monitoring on AWS, Azure, and GCloud.
Consolidating fragmented data sources into a single source of truth — fast analytical platforms that feed reliable data into models and dashboards.
Making results accessible to decision-makers — interactive dashboards and automated reports that translate analytics into action.
Delivering measurable impact — reduced forecast errors, automated pipelines, and data-driven decisions across pharma, defense, energy, manufacturing, and finance.
Consulting a manufacturing company on demand forecasting in Kinaxis Maestro — selecting forecast metrics and levels, evaluating forecast quality, and maximizing the platform's forecasting capabilities.
Rigorous statistical analysis of spectral data in GMP-validated CMC environments for pharmaceutical manufacturing.
Automated ML pipeline for Order Intake, Revenue, and Cash Flow prediction, used by the controlling department.
Automated data quality pipeline validating input data before it feeds into the forecasting engine, with reporting on AWS S3.
Production ML pipeline on AWS SageMaker for SKU-level demand forecasting with a clear data pipeline enabling the local data scientist to run reproducible experiments.
Scheduled monthly demand planning pipeline running twice per cycle — one run for APO data and one for Kinaxis data — embedded in a strict demand planning workflow.
Specialised forecasting engine for spare parts, handling intermittent demand patterns and optimising safety stock levels to minimise stockouts and improve equipment uptime.
ETL consolidation of MS Dynamics 365 data sources into a unified risk scoring engine for Purchasing and Sales exposure assessment.
Rust-to-WebAssembly compiled forecasting engine embedded in Google Sheets, used by controlling for end-of-month revenue forecasting across subsidiaries.
Statistical strategy for BioPharma CMC — rigorous statistical analysis, process validation, and equivalence testing under GMP.
Containerized predictive analytics pipeline on MS Azure forecasting next-order dates via feature-engineered customer and territory models on automated weekly/monthly schedules.
Statistical time-series models in R predicting customer order windows, with dockerized Quarto reporting pipelines deployed on Azure Cloud.
Hybrid predictive maintenance combining photovoltaic generation models with statistical time-series forecasting and R Shiny monitoring dashboards.
Established Data Science practice — Credit Risk scoring models and Insurance Pricing engines on MS Azure with team mentoring and agile integration.
Distributed forecasting platform on Azure Databricks processing smart meter readings with Spark MLlib model training.
High-performance R/C++ package implementing Mahalanobis distance and Isolation Forest methods for multivariate outlier detection in global contract data.
Whether it's demand forecasting, data quality, or getting statistical models into production — let's talk about how I can help.
Start_ConversationBuilding the next generation of tools — from inventory optimization to deep learning forecasting — to solve harder problems faster.
High-performance Rust RAG framework for document ingestion, chunking, embedding, and retrieval with async pipeline orchestration.
Deterministic and probabilistic inventory optimization framework in Rust — safety stock, reorder points, replenishment policies, and Monte Carlo simulation.
Rust implementation of Amazon Chronos-2 time-series forecasting on the Burn deep learning framework with multi-backend GPU/CPU inference and finetuning.
Scikit-learn-inspired machine learning library for Rust built on ndarray — preprocessing, trees, ensembles, clustering, and cross-validation pipelines.
LLM-augmented SQL for DuckDB — call any OpenAI-compatible model from SQL to classify, extract, and enrich data with schema-typed results and atomic execution.
Git-like branching for DuckDB — isolated what-if scenarios with copy-on-write storage, row-level diffs, immutable snapshots, and embedded audit trails in a single .duckdb file.
Giving back to the community — production-tested libraries and tools available on GitHub.
Python bindings for the mlfinance AFML toolkit — a high-performance Rust implementation of methods from Advances in Financial Machine Learning by Marcos López de Prado.
R interface for high-performance Functional Data Analysis (FDA) powered by a Rust core for ultra-fast computation.
Comprehensive R package for conducting event studies in finance and economics with high-speed execution.
Implementation of Case-Based Reasoning (CBR) systems for intelligent decision support and pattern matching.
A Rust-native DuckDB extension providing a complete time-series forecasting toolkit via SQL. Integrates 32 models including AutoARIMA, AutoETS, TBATS, MSTL, and intermittent demand methods (Croston, ADIDA, IMAPA). Supports hierarchical time series, expanding/sliding window cross-validation, conformal prediction intervals, changepoint detection, and 76+ tsfresh-compatible feature extraction functions with native DuckDB parallelization.
-- Forecast 10,000 products in one querySELECT * FROM ts_forecast_by( 'sales', item_id, date, quantity, 'AutoARIMA', 12, '1M', MAP{'seasonal_period': '12'});DuckDB extension for in-database regression (OLS, Ridge, Elastic Net, Quantile), hypothesis testing, and diagnostics — validated against R's statistical packages.
DuckDB extension providing 81 SQL functions for data validation, anomaly detection (Isolation Forest, DBSCAN, OutlierTree), PII masking, and data diffing.
Implementation of Kalman and Particle Filters in C++ for real-time robot navigation and environment mapping.
I'm Simon Müller — a mathematician turned systems engineer with over a decade of experience helping companies make better decisions through data.
After completing my PhD in Mathematics, I spent years working at the intersection of rigorous statistics and production software — first in academia, then in consulting for some of Europe's largest manufacturers, pharma companies, and financial institutions.
What sets me apart: I don't just build models — I ship them. My clients get production systems they can depend on, not prototypes that need another team to productionize. Whether that's a Bayesian forecasting engine running on AWS SageMaker or a Rust library compiled to WebAssembly running in the browser.
Based in Germany. Available for remote and on-site engagements across Europe.
Have a forecasting, data quality, or statistical challenge? Let's discuss how I can help — from short consulting engagements to full system implementation.
We use cookies for analytics (Google Analytics) to improve this website. No data is collected without your consent.