Mineral Processing Analysis with Python

Bryan Eckard
Feb 27, 2023
3 min read

Overview

In mining ore, one of the most widely used processes for extracting the minerals is called flotation. In its basic form, flotation involves making fine particles hydrophobic (not wanting to mix with water) which causes them to stick to air bubbles (froth). The froth is removed and contains higher concentrations of the mineral (Britannica, Flotation | ore dressing | Britannica). In this analysis, I looked at data collected during this process for iron. Part of this process is trying to remove silica from the iron concentrate for a purer sample.

In this project, I take on the role as a new data analyst for a mining company. As you will see, this data correctly shows that the correlation between iron and silica concentrates is what should be expected for the 6 months included in the data. As iron concentration increases, silica concentration decreases.

The Data

This dataset can be found on Kaggle, Quality Prediction in a Mining Process | Kaggle. This was real data taken from March 2017 to September 2017, and each row is a time point at 20 second intervals. It contains, 737,453 rows and 24 columns. This was determined by using the shape attribute in python.

Shape attribute and result in python.

The Analysis

For this analysis, Python libraries Pandas, Seaborn, and Matplotlib were used. Some initial basic cleaning was needed to change the commas representing decimal points in the original dataset. This was done withing the "read_csv" function.

Reading in csv file and changing to decimal.

Previewing changes to dataset.

Also, the date column needed to be changed to a "datetime" series so I could analyze the some of the variables over time.

to_datetime function

After this initial cleaning, I was able to start looking for insights in the data and start inspecting for any anomalies.

First, I found some basic summary statistics for the whole DataFrame.

Describe function for basic stats.

Two images showing describe statistics.

Then, I created DataFrames to analyze the months of April, June, and August and created a list of the most important columns needed to analyze the process.

DataFrames to analyze months of April, June, and August. June is first because I had not decided the other months and it is in the middle.

DataFrame with just the important columns.

I, then, found the descriptive statistics for the month DataFrames.

Descriptive statistics for April's important columns.

Descriptive statistics for June's important columns.

Descriptive statistics for August's important columns.

I wanted to see if any of these columns were correlated with each other, so I created pairplots using the Seaborn library and Pandas correlation method. The following are for the whole dataset, April, June, and August respectively.

Pairplot and correlations for whole dataset.

Pairplot and correlations for April.

Pairplot and correlations for June.

Pairplot and correlations for August.

Final Thoughts

What do all these numbers mean? The means are all pretty close except June's which is approximately 22 mm higher in the Flotation Column 5 Level. Also, since less silica raises the iron concentration, we see a high negative correlation of approximately 80%. This correlation is approximately 85% in April and 82% in August. It drops to 72% in June. None of the % iron concentration correlations are very strong. However, June again has a negative correlation where all the others are positive.

Since June has several larger deviations in the statistics, I would advise my supervisor that further investigation is needed if anything changed in the process or personnel that may have contributed. This will help prevent this in the future, so the highest concentrations of iron are always being collected.

Thank you for reading! If you have any questions, feel free to comment below, reach out at my email (bryaneckarddata@gmail.com), or connect with me on LinkedIn Bryan Eckard.