Associated with: University of California, Berkeley
Class: DATA 100: Principles and Techniques in Data Science.
Built a Spam and Ham email classifier. The baseline model had an accuracy of 0.85. After cross-validation for feature and model selection, and preventing overfitting, the final model had an accuracy of 0.92.
Learnings:
- EDA techniques
- Feature Engineering techniques
- modeling
- Evaluating a Logistic Regression model
Â