Naturally occurring arsenic (As) contamination, magnified by anthropogenic activities, threatens millions of people reliant upon groundwater for their drinking water, across South and Southeast Asia. The spatial extent of As contamination in India remains elusive and recognition of its existence in new regions currently depends on slow, uncoordinated, and unsystematic testing of groundwater samples. The thesis report results obtained from the first all India ensemble predictive model of As in groundwater. The machine-learning model is trained on the dataset, comprising 14185 As datapoints and 22 proxy variables representing geological, topographical and environmental environment which influence As release. The model then predicts the geographical regions of groundwater-As hazard using the World Health Organization’s provisional guideline 10 μg/L for As in drinking water. The significant predictor variables used in the model broadly shows coherent correlation with processes responsible for mobilization of As in groundwater. With an overlay of population density on these predictions, the project estimates 130 million rural Indians to be at high to moderate risk from As in groundwater. The model also identifies several new regions in India where nominal or no testing has been conducted. The predictive As map that identifies high risk regions, can serve as a useful guide to policymakers to prioritize resources for field testing, followed by subsequent efforts to mitigate the devastating health impacts of As poisoning.
Keywords: Machine Learning, Ensemble, Prediction, Arsenic.