Exploration of Efficient Pre-training Techniques for Speech and Audio Large Language Models