[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"$fVuAReBoq-D_8w0lTsKq9C9EO2dWetRbkFyKceAu-Tjs":3},{"id":4,"title":5,"slug":6,"excerpt":7,"contentMarkdown":8,"contentHtml":9,"coverImage":10,"tags":11,"status":15,"publishedAt":16,"author":17,"readingTimeMinutes":18,"seo":19,"createdAt":23,"updatedAt":23},"6a08a326b35e61ba9ea7a907","I tested 5 ML algorithms across 7 datasets. The \"best\" one didn't exist.","ml-algorithms-no-best-one","A year-long master's thesis, a Python benchmarking tool, and seven evaluation metrics. The lesson wasn't which algorithm wins — it was why the question is wrong.","# I tested 5 ML algorithms across 7 datasets. The \"best\" one didn't exist.\n\nEvery machine learning tutorial gives you the same advice: \"Try Random Forest first.\"\n\nI followed that advice for years. Then I spent a year building a tool to test it properly — and the answer turned out to be uncomfortable.\n\n## The setup\n\nFor my master's thesis, co-supervised between the University of Maribor and UPC Barcelona, I built a Python tool that benchmarks five classification algorithms — Logistic Regression, Decision Tree, Random Forest, SVM, and KNN — across seven datasets, scoring each on seven metrics (Accuracy, Precision, Recall, F1, Cohen's Kappa, Hamming Loss, Jaccard) plus training and prediction time.\n\nI expected a clear winner. I didn't get one.\n\n![Screenshot of the \"Algorithm Performance Comparison\" desktop tool built for the master's thesis — a Python GUI where you pick a dataset (e.g. Iris) and an algorithm (e.g. Logistic Regression), hit Run Algorithm, and it fills in Recall, Precision, Accuracy, F1 Score, Cohen's Kappa, Hamming Loss, Jaccard Similarity Score, and training Time, with a Generate Graph button at the bottom.](https:\u002F\u002Fkjjl82lign75qi0t.public.blob.vercel-storage.com\u002Fuploads\u002F1778878681514-3KrCLCsGKb.png)\n\n## What actually happened\n\n- On **Iris and Digits**, every algorithm performed beautifully. Pick any one.\n- On **Wine and Breast Cancer**, Logistic Regression and Random Forest dominated; SVM and KNN trailed badly.\n- On **Olivetti Faces**, SVM jumped to the front.\n- On **Covertype**, KNN was the *best* — the same KNN that was nearly the worst elsewhere. SVM, the hero of the previous dataset, collapsed.\n\n![Bar chart of Jaccard Score, Accuracy, Cohen's Kappa, and Hamming Loss across Logistic Regression, Decision Tree, Random Forest, SVM, and KNN on the Covertype dataset — KNN tops every correctness metric, Logistic Regression and SVM trail badly.](https:\u002F\u002Fkjjl82lign75qi0t.public.blob.vercel-storage.com\u002Fuploads\u002F1778878680660-EJET93a2S_.png)\n\nIf I'd run this thesis on three of those datasets, I would have left convinced Random Forest was the answer. On the other four, I'd have walked away convinced of something completely different.\n\n## Why this actually matters\n\nWe talk about ML algorithms like they have personalities — \"use this one for X, this one for Y.\" But the *dataset* has the personality. Algorithm choice is a conversation between the math and the data in front of you, not a rule you memorize once from a blog post.\n\nIn production, this means: the \"best\" model is the one you measured on your own data. Cheap, tedious benchmarking beats expensive reasoning from authority.\n\nThere's a second layer the metrics taught me. Accuracy and F1 don't always agree. Cohen's Kappa can flag a model that's beating random chance by a hair while accuracy says it's \"great.\" A model can win on one metric and lose on every other measure that actually matters for an imbalanced dataset. You don't know which metric matters until you know what a wrong prediction costs you.\n\n## What it taught me\n\nBuild the thing that measures, not the thing that argues. The tool I shipped was simpler than any of the algorithms it tested — and it answered the question better than any blog post could.\n\nAlso: track *time*, not just accuracy. A model that's 1% more accurate and 10× slower to train is the wrong choice more often than people admit.\n\nThe real takeaway, looking back: the discipline of measuring is more valuable than the model you measure.\n\nIf you want the full version — methodology, all seven datasets, every chart, the comparison tool — the [full master's thesis is open access on UPCommons](https:\u002F\u002Fupcommons.upc.edu\u002Fentities\u002Fpublication\u002F7b421410-f2ed-4a50-b736-970b7170ae0f).\n\n---\n\nWhat's your default algorithm when you start a new classification problem? Curious where the consensus sits today.\n\n#MachineLearning #DataScience #Python #ScikitLearn #MastersThesis #PredictiveAnalytics #Classification\n","\u003Ch1>I tested 5 ML algorithms across 7 datasets. The “best” one didn’t exist.\u003C\u002Fh1>\n\u003Cp>Every machine learning tutorial gives you the same advice: “Try Random Forest first.”\u003C\u002Fp>\n\u003Cp>I followed that advice for years. Then I spent a year building a tool to test it properly — and the answer turned out to be uncomfortable.\u003C\u002Fp>\n\u003Ch2>The setup\u003C\u002Fh2>\n\u003Cp>For my master’s thesis, co-supervised between the University of Maribor and UPC Barcelona, I built a Python tool that benchmarks five classification algorithms — Logistic Regression, Decision Tree, Random Forest, SVM, and KNN — across seven datasets, scoring each on seven metrics (Accuracy, Precision, Recall, F1, Cohen’s Kappa, Hamming Loss, Jaccard) plus training and prediction time.\u003C\u002Fp>\n\u003Cp>I expected a clear winner. I didn’t get one.\u003C\u002Fp>\n\u003Cp>\u003Cimg src=\"https:\u002F\u002Fkjjl82lign75qi0t.public.blob.vercel-storage.com\u002Fuploads\u002F1778878681514-3KrCLCsGKb.png\" alt=\"Screenshot of the &quot;Algorithm Performance Comparison&quot; desktop tool built for the master's thesis — a Python GUI where you pick a dataset (e.g. Iris) and an algorithm (e.g. Logistic Regression), hit Run Algorithm, and it fills in Recall, Precision, Accuracy, F1 Score, Cohen's Kappa, Hamming Loss, Jaccard Similarity Score, and training Time, with a Generate Graph button at the bottom.\" loading=\"lazy\" \u002F>\u003C\u002Fp>\n\u003Ch2>What actually happened\u003C\u002Fh2>\n\u003Cul>\n\u003Cli>On \u003Cstrong>Iris and Digits\u003C\u002Fstrong>, every algorithm performed beautifully. Pick any one.\u003C\u002Fli>\n\u003Cli>On \u003Cstrong>Wine and Breast Cancer\u003C\u002Fstrong>, Logistic Regression and Random Forest dominated; SVM and KNN trailed badly.\u003C\u002Fli>\n\u003Cli>On \u003Cstrong>Olivetti Faces\u003C\u002Fstrong>, SVM jumped to the front.\u003C\u002Fli>\n\u003Cli>On \u003Cstrong>Covertype\u003C\u002Fstrong>, KNN was the \u003Cem>best\u003C\u002Fem> — the same KNN that was nearly the worst elsewhere. SVM, the hero of the previous dataset, collapsed.\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>\u003Cimg src=\"https:\u002F\u002Fkjjl82lign75qi0t.public.blob.vercel-storage.com\u002Fuploads\u002F1778878680660-EJET93a2S_.png\" alt=\"Bar chart of Jaccard Score, Accuracy, Cohen's Kappa, and Hamming Loss across Logistic Regression, Decision Tree, Random Forest, SVM, and KNN on the Covertype dataset — KNN tops every correctness metric, Logistic Regression and SVM trail badly.\" loading=\"lazy\" \u002F>\u003C\u002Fp>\n\u003Cp>If I’d run this thesis on three of those datasets, I would have left convinced Random Forest was the answer. On the other four, I’d have walked away convinced of something completely different.\u003C\u002Fp>\n\u003Ch2>Why this actually matters\u003C\u002Fh2>\n\u003Cp>We talk about ML algorithms like they have personalities — “use this one for X, this one for Y.” But the \u003Cem>dataset\u003C\u002Fem> has the personality. Algorithm choice is a conversation between the math and the data in front of you, not a rule you memorize once from a blog post.\u003C\u002Fp>\n\u003Cp>In production, this means: the “best” model is the one you measured on your own data. Cheap, tedious benchmarking beats expensive reasoning from authority.\u003C\u002Fp>\n\u003Cp>There’s a second layer the metrics taught me. Accuracy and F1 don’t always agree. Cohen’s Kappa can flag a model that’s beating random chance by a hair while accuracy says it’s “great.” A model can win on one metric and lose on every other measure that actually matters for an imbalanced dataset. You don’t know which metric matters until you know what a wrong prediction costs you.\u003C\u002Fp>\n\u003Ch2>What it taught me\u003C\u002Fh2>\n\u003Cp>Build the thing that measures, not the thing that argues. The tool I shipped was simpler than any of the algorithms it tested — and it answered the question better than any blog post could.\u003C\u002Fp>\n\u003Cp>Also: track \u003Cem>time\u003C\u002Fem>, not just accuracy. A model that’s 1% more accurate and 10× slower to train is the wrong choice more often than people admit.\u003C\u002Fp>\n\u003Cp>The real takeaway, looking back: the discipline of measuring is more valuable than the model you measure.\u003C\u002Fp>\n\u003Cp>If you want the full version — methodology, all seven datasets, every chart, the comparison tool — the \u003Ca href=\"https:\u002F\u002Fupcommons.upc.edu\u002Fentities\u002Fpublication\u002F7b421410-f2ed-4a50-b736-970b7170ae0f\" target=\"_blank\" rel=\"noopener noreferrer\">full master’s thesis is open access on UPCommons\u003C\u002Fa>.\u003C\u002Fp>\n\u003Chr \u002F>\n\u003Cp>What’s your default algorithm when you start a new classification problem? Curious where the consensus sits today.\u003C\u002Fp>\n\u003Cp>#MachineLearning #DataScience #Python #ScikitLearn #MastersThesis #PredictiveAnalytics #Classification\u003C\u002Fp>\n",null,[12,13,14],"machine-learning","data-science","python","published","2026-05-16T17:02:30.072Z","goxkitech",3,{"metaTitle":20,"metaDescription":21,"ogImage":22},"5 ML algorithms across 7 datasets: the \"best\" didn't exist","A master's thesis that benchmarked 5 classification algorithms across 7 datasets — and dismantled the \"just use Random Forest\" default.","https:\u002F\u002Fkjjl82lign75qi0t.public.blob.vercel-storage.com\u002Fuploads\u002F1778878680660-EJET93a2S_.png","2026-05-16T17:02:30.068Z"]