AVAILABLE FOR OPPORTUNITIES

Sihle Kalolo

Data Analyst & Scientist | Turning Disruption into Insight

Curious. Resilient. Mission-Driven. I help purpose-led teams find clarity in complex data.

Sihle Kalolo

My Path: From Disruption to Discovery

University of Fort Hare Logo

BSc Computer Science & Applied Math

University of Fort Hare

WeThinkCode_ Logo

WeThinkCode_

Software Engineering

ExploreAI Academy Logo

ExploreAI Academy

Data Science

Sand Technology Logo

Sand Technology

Data Scientist

My journey into data hasn’t been linear. During my final year studying BSc in Computer Science and Mathematics at the University of Fort Hare, I experienced a serious car accident in September 2021 that temporarily slowed my academic progress.

That period became a turning point. It pushed me to reassess my path and focus on building practical, industry-ready skills. I redirected my energy into intensive, hands-on training through WeThinkCode_ and ExploreAI Academy, where I developed a strong foundation in data science and real-world problem solving.

There, I discovered my true passion: using data not just for theory, but for real-world impact. I learned to wrangle unruly data, build predictive models, and visualize insights that help humans make better decisions.

That car accident didn't end my story; it gave it a new, more focused direction. Now, I'm on a mission to help others find signal in the noise.

Selected Work: Driving Impact with Data

Sand Technology / Thames Water

Predicting River Health for Thames Water

Featured Project

1.5M+

Monthly Records

50+

Monitoring Stations

85%

Faster Reporting (5d→3h)

25%

↓ Incident Risk

Complex SQL Engineering: Wrote complex SQL queries to analyze data from 50+ stations and 12 field scientists, uncovering patterns in water quality and sewage discharge risks.

Dashboard Automation: Built interactive Power BI dashboards for 30+ field teams, slashing manual reporting time from 5 days to 3 hours.

Data Hygiene: Championed data hygiene by validating datasets and correcting 8,500+ inconsistencies, ensuring stakeholders could trust the insights.

Predictive Intelligence: Identified trends that enabled predicting sewage discharge events up to 48 hours in advance, leading to proactive interventions.

WeThinkCode_ (2023 Cohort)

Technical Mentor

Chosen among a select group of mentors to guide and shape the career paths of future tech professionals.

  • Selected to serve as a Technical Mentor for the 2023 Cohort, contributing expertise in Python programming.
  • Providing guidance, support, and insights to students, fostering their technical and professional growth.
  • Collaborating with fellow mentors to create impactful learning experiences and workshops.
  • Facilitating induction and training sessions to equip students with essential skills for success in the tech industry.

Data Playground: Real Problems, Real Code

MACHINE LEARNING · REST API · STREAMLIT

South Africa Public Procurement Intelligence System 🇿🇦

Overview: An end‑to‑end machine learning system that transforms raw e‑tender data (OCDS format) into actionable intelligence for suppliers, procurement officials, and policy analysts.

What It Does: Features an interactive Streamlit Dashboard with real‑time ML predictions, a REST API with 10 endpoints for programmatic access, and an Automated Excel Reporter. The machine learning pipeline utilizes Random Forest, XGBoost, LightGBM, and an ensemble for contract value forecasting, alongside 7 governance red‑flag anomaly detectors.

"No Power BI required – the built‑in interactive dashboard and automated Excel reports fully meet all analytical needs."

Key Outcomes

  • 🚀Identified top opportunities with confidence scores
  • 🚩Flagged 37% of contracts with governance risks
  • 💰Delivered median‑anchored contract value predictions
  • ⚙️Automated Scheduler for weekly retraining & updates

Impact

  • Empowers SMEs to make data‑driven bidding decisions
  • Helps detect procurement irregularities
  • Provides analysts with transparent, reproducible intelligence
Python MySQL 🤖 Scikit-learn 🌲 XGBoost & LightGBM 📊 Streamlit ⚙️ FastAPI 📁 openpyxl
DATA SCIENCE · ML · PREDICTIVE ANALYTICS

Predictive Crime Hotspot Modeling

Overview: Built a machine learning solution to forecast future crime counts at police station level, enabling proactive resource allocation and early identification of high-risk crime hotspots.

This project focuses on predicting future crime trends using historical SAPS station-level data (2008–2023). I designed an end-to-end data pipeline starting with SQL-based extraction and aggregation, followed by advanced feature engineering. I developed and combined three models—Random Forest, XGBoost, and LightGBM—into an ensemble to capture temporal and spatial patterns.

"Achieved high predictive accuracy with MAE ~200 crimes per station-year, enabling actionable forecasting for real-world policing strategies."

Key Features

  • Forecasted station-level crime with ML ensemble
  • Engineered lag, rolling averages & clustering
  • Identified high-risk stations (top 20%)
  • Generated 2024 forecasts for planning

Business Impact

  • Enables proactive vs reactive policing
  • Efficient resource & patrol allocation
  • Early identification of emerging hotspots
  • Data-driven public safety strategy
Python XGBoost & LightGBM SQL Power BI Machine Learning Predictive Modeling Crime Analytics Geospatial
Explore Project Repository

IMS StepUp SA — Marketing Analytics Dashboard

End-to-end marketing analytics project for a South African footwear retailer. Built a 4-page executive dashboard in Power BI covering campaign ROI, store & product sales, customer segmentation and CLV, and website performance (GA4) — powered by Python data cleaning and DAX measures across 24 months of real business data.

Python Power BI DAX Pandas Data Cleaning Customer Segmentation Marketing Analytics
View on GitHub

Vegetable-Prices

Vegetable Price Analysis and Forecasting – analyzes historical vegetable prices across various regions to identify key trends and regional disparities using machine learning techniques.

Python Jupyter Forecasting EDA
View on GitHub

Pizza-Sales

Interactive Pizza Sales Analysis dashboard built with Tableau.

Tableau Data Visualization Sales Analytics
View on GitHub

Supermarket-sales

Analyzes 3-month sales data from 3 supermarket branches. Uses MySQL for data cleaning, Power BI for visualization, and predictive analytics to identify trends and forecast sales.

MySQL Power BI Python Predictive Modeling
View on GitHub

House-Prices

Predict sales prices with feature engineering, random forests, and gradient boosting.

Python Scikit-learn Regression Feature Engineering
View on GitHub

Tools & Technologies

Data Querying

Advanced SQL CTEs & Windows BigQuery Database Exploration

Data Science

Python Pandas & NumPy EDA Machine Learning Stats

Visualization

Power BI Tableau LookerStudio Matplotlib Excel

Cloud & Core

AWS (S3) Git Storytelling Data Cleaning Agile

Verified Credentials

SQL(Basic) - HackerRank
Git Course Intermediate - WildLearner
Python(Basic) - HackerRank
Java Programming - Sololearn
Introduction to SQL - Sololearn
Motheo 1.0 - Momentum
Python (Intermediate) - WildLearner
SQL Intermediate - Sololearn
Java(Basic) - HackerRank
Python Core - Sololearn
Java Course - Coursera
Python Programming - Sololearn

Continuous Learning

UFH Logo

University of Fort Hare

Bachelor of Science (Computer Science & Applied Mathematics)

2018–2021

Coursework completed through September 2021. Achieved multiple distinctions, including 96% for Geometry and 81% for Computer Literacy.

WTC Logo

WeThinkCode_

Software Engineering Programme | NQF Level 5

Intensive, peer-led learning focused on practical application. Final Score: 4.18.

Explore Logo

ExploreAI Academy

Data Science Programme

Focused on applied data science, machine learning, and advanced analytics.

QCTO Statement of Results (118708)

Completed multiple Data Science modules (PM-02 to PM-10, WM-01 to WM-04).

Ready to find clarity in the chaos?

Let's turn your raw data into actionable insights. I am currently open for opportunities in South Africa (Hybrid/Remote/On-site).