Here are the 8 step process for any ML/analytics project for your industrial use case
| Step 1: Understand the real objectives and big picture Model building may not be the end goal! Who are the customers/users ? How are these models gonna be used ? Current approach/solution (if any), limitations, efforts, manual/automated ? Above points need to be clarified to frame the problem, algorithm options, performance metrics, efforts involved. With above details, data scientist can start designing framework and solution – unsupervised/semi/supervised/reinforcement learning Anomaly detection/classification/regression Bach learning or online Univariate or multivariate Performance measures options -Expert validation Different performance metrics depending upon problems – rmse, mae, etc. |
| Step 2: Get the data Where are the training data located – cloud buckets/computers How to access the data – direct download from cloud/api/offline Understand Metadata, data structure, data typesIs it raw data or processed or features only Note when you are dealing with varying data, offline access or download option may not be suitable as accessing all sorts of data will be time consuming. Its better to have a playground where you have access to all the data and you can run your codes. |
| Step 3: Exploratory data analysis In-depth study of distribution, trends, patterns, correlations using advanced visualization tools. Features derivation |
| Step 4: Data preparation Build Transformation and features functions Data filtering/cleaning/filling/Scaling |
| Step 5: Model training and selection Based on problem type – try out multiple model/algos/approaches Have a platform where multiple algorithms can be tried out and compared easily Cross validation Human-in-loop |
| Step 6: Model fine tuning Once models are shortlisted, fine-tuning is required Grid search for optimal hyperparameter Try ensemble methods (multi modal) Evaluate best performing models and errors Evaluate thoroughly Human-in-loop |
| Step 7: Solution presentation Start with big picture first Summarize and highlight what you have done Explain how this solution is gonna help business Share interesting findings/observations Highlight assumptions |
| Step 8: Deploy and monitor Connect with production data Monitoring system to track concept drift, performance degradation |
DOCUMENT ALL YOUR WORK WITH SUMMARIES.
Leave a comment