A Memory-Centric Customizable Domain-Specific Fpga Overlay For Accelerating Machine Learning Applications