For my project on translating live video sign language into text, I embarked on initial research to identify the best software tools for capturing the critical data needed to build a comprehensive training dataset. I decided on MediaPipe, an open-source software that excels in capturing intricate data points related to human hand movements, body posture, and facial expressions.
After selecting MediaPipe, I utilized OpenCV to record video clips showcasing the American Sign Language (ASL) alphabet. This setup allowed me to apply MediaPipe effectively, ensuring I captured all the necessary data points from the videos. These data points were crucial for developing our training dataset.
With the training data in hand, I proceeded to experiment with various machine learning models to optimize the interpretation accuracy. I employed several algorithms in this development phase, including K-Nearest Neighbors (KNN), Naive Bayes, Random Forest, and Sequential Models. Each model offered different strengths and insights, which were instrumental in refining our approach to translating sign language into text with high accuracy and reliability.