GymLytics: AI-powered Realtime Workout Analytics

10 min readJan 12, 2024

Through this blog, I want to highlight my project that combines physical training with Machine Learning to generate insights from a workout. This is an account of the entire experiment — starting from the formulation of the problem statement till result analysis.

Abstract💡

In today’s tech-driven world, fitness and technology are joining forces to create innovative solutions for health and wellness. I took a stab at it through my project, GymLytics — an open-source GitHub project that leverages the power of computer vision and machine learning to revolutionize how we track and monitor our workouts.

Using computer vision and machine learning to track your body movements throughout a workout, correct your posture, keep tabs on your reps, and ensure you achieve fitness goals safely and effectively.

Code: https://github.com/akshaybahadur21/GymLytics

Preface 🖼️

Traditional workout tracking methods often involve manually entering exercise details, sets, and repetitions into apps or notebooks. However, this process has several limitations, including potential inaccuracies and disrupting workout flow. Manual tracking can lead to errors, hinder motivation, and impede the overall effectiveness of fitness routines.

Furthermore, a gym user has to depend on a gym trainer or an experienced gym-goer for posture correction. Several users might develop injuries due to the absence of proper guidance.

Mediapipe🚰

In order to understand the workings of the module, we need to discuss the Mediapipe library that enables developers like me to integrate advanced computer vision concepts seamlessly with the system.

Google’s MediaPipe is a cutting-edge open-source framework that transforms the landscape of real-time multi-modal solutions. Offering a versatile set of tools, from hand tracking to pose estimation, MediaPipe empowers developers to create immersive applications across diverse industries. Its seamless integration with TensorFlow, efficient inference models, and cross-platform compatibility make it a go-to choice for projects ranging from augmented reality experiences to health and fitness applications.

MediaPipe’s strength lies in its ability to handle diverse tasks across multiple modalities. Here are some of the key functionalities that make MediaPipe stand out:

Hand Tracking: MediaPipe’s hand tracking module enables real-time tracking of hand movements in video streams. This is particularly useful for gesture recognition, sign language interpretation, and virtual touch interfaces.

Face Detection and Recognition: The face detection and recognition capabilities of MediaPipe allow developers to build applications that analyze facial features, expressions, and even recognize individuals. This is fundamental for applications ranging from augmented reality filters to biometric authentication systems.
Pose Estimation: MediaPipe’s pose estimation module accurately tracks human body poses in real-time. This is invaluable for applications in fitness tracking, gesture-based interfaces, and animation.

Holistic: The Holistic module combines face, hand, and pose tracking to provide a holistic understanding of human activities. This is particularly beneficial for applications that require a comprehensive understanding of user interactions, such as fitness coaching or immersive gaming.

Modules 🔥

At its core, GymLytics uses a combination of cameras and machine learning algorithms to automatically analyze exercises performed during a workout and share feedback regarding posture correction and rep completion. These capabilities are achieved through a systematic pipeline that encompasses these submodules:

Camera Setup: GymLytics requires one or more cameras strategically placed in the workout area to capture your movements. Alternatively, users can also record a video and use it as a source for the pipeline to analyze their workout. For this experiment, I used both a laptop webcam (for real-time analysis) and an iPhone 11 camera (for post-workout analysis).

Keypoint Detection: GymLytics utilizes a pre-trained MediaPipe landmark model based on a Convolutional Neural Network. This model accepts incoming video feeds from webcams (256 x 256 pixels’ RGB values) and outputs a result of shape [33, 5]. This translates to the detection of 33 key points, each having associated X, Y, and Z coordinates, along with visibility and presence factors.

Exercise Classification: As of the writing of this blog, I haven’t implemented the exercise classification module. GymLytics only supports manual exercise selection, based on which the application will start analyzing the video feed. However, using the dimensions of the key point, we can train an exercise classifier to automatically classify the gym exercises before sending it down the pipeline for further analysis.

Posture Correction: GymLytics’ posture correction uses the key points to calculate limb angles and compare them against the pre-recorded benchmarks. This step identifies whether the person has the correct posture for the exercise. Additionally, the key points aid in accurately counting the number of reps performed for each workout.

Data Logging: GymLytics uses the body key points to detect the correctness of posture and stores the count of correct exercise repetitions. Although not currently implemented, it can also log your workout data, making it accessible for review and analysis after the session.

Implementation 👨‍🔬

The current implementation is done in Python using two main libraries — OpenCV and Mediapipe. OpenCV’s comprehensive image-processing libraries seamlessly integrate with MediaPipe’s real-time multi-modal solutions. Leveraging OpenCV’s pixel-level manipulations and feature extraction capabilities enhances the preprocessing phase, which is crucial for subsequent MediaPipe modules. For instance, precise hand tracking and pose estimation benefit from OpenCV’s robust image manipulation functions, ensuring accurate keypoint detection.

Supported Exercise types

Pushup
Squat
Lunges
Shoulder Taps
Plank

Source

‘0’ for webcam
Any other source for a prerecorded video

Setup

Run pip install requirements.txt to resolve the code dependencies.
You can either record the exercise you want to perform analytics on or set up your webcam to stream your exercise in runtime.
Select the type of exercise you want to perform. (Look above for the supported exercises)
Run the GymLytics.py file with your current configuration

Execution

python3 GymLytics.py --type shouldertap --source resources/shouldertap_aks.mov

Snippets ✂️

In this section, we will discuss some parts of the Python implementation in detail. Let’s have a look at the project structure.


  ├── GymLytics
      ├── .github 
      ├── src
          ├── exercises
              ├── Pushup.py
              ├── Plank.py
              .
              .
          ├── ThreadedCamera.py
          └── utils.py
      ├── LICENSE
      ├── GymLytics.py
      ├── requirements.txt
      └── readme.md

The two main aspects of the code are written in

Utils.py: This contains all the utility and helper functions used for the workout analysis.
Exercises: This folder contains the implementation for different exercises, i.e., pushups, squats, etc. In each Python file, we have exercise-specific Image Processing logic and corresponding Machine Learning code for analyzing the workout.

For a better understanding, let’s have a look at the Plank exercise implementation in detail.

def exercise(self, source):
        threaded_camera = ThreadedCamera(source)
        eang1 = 0
        plankTimer = None
        plankDuration = 0
        while True:
            success, image = threaded_camera.show_frame()
            if not success or image is None:
                continue
            image = cv2.flip(image, 1)
            image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
            results = pose.process(image)
            image.flags.writeable = True
            image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
            mp_drawing.draw_landmarks(
                image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS,
                landmark_drawing_spec=pose_landmark_drawing_spec,
                connection_drawing_spec=pose_connection_drawing_spec)
            idx_to_coordinates = get_idx_to_coordinates(image, results)

The above-mentioned code snippet contains the first few lines of the code. Here, we do a bunch of initialization for all the variables that will be used later on in the code. eang1 stores your back angle that should be greater than 170° for a correct plank position.

results = pose.process(image) sends each frame from the video source to the Mediapipe library to detect the pose keypoints.

try:
    # shoulder - back - ankle
    if 11 in idx_to_coordinates and 23 in idx_to_coordinates and 27 in idx_to_coordinates:  # left side of body
        cv2.line(image, (idx_to_coordinates[11]), (idx_to_coordinates[23]), thickness=6,
                 color=(255, 0, 0))
        cv2.line(image, (idx_to_coordinates[23]), (idx_to_coordinates[27]), thickness=6,
                 color=(255, 0, 0))
        eang1 = ang((idx_to_coordinates[11], idx_to_coordinates[23]),
                    (idx_to_coordinates[23], idx_to_coordinates[27]))
        cv2.putText(image, str(round(eang1, 2)),
                    (idx_to_coordinates[23][0] - 40, idx_to_coordinates[23][1] - 50),
                    fontFace=cv2.FONT_HERSHEY_SIMPLEX,
                    fontScale=0.8, color=(0, 255, 0), thickness=3)
        cv2.circle(image, (idx_to_coordinates[11]), 10, (0, 0, 255), cv2.FILLED)
        cv2.circle(image, (idx_to_coordinates[11]), 15, (0, 0, 255), 2)
        cv2.circle(image, (idx_to_coordinates[23]), 10, (0, 0, 255), cv2.FILLED)
        cv2.circle(image, (idx_to_coordinates[23]), 15, (0, 0, 255), 2)
        cv2.circle(image, (idx_to_coordinates[27]), 10, (0, 0, 255), cv2.FILLED)
        cv2.circle(image, (idx_to_coordinates[27]), 15, (0, 0, 255), 2)
except:
    pass

This above-mentioned code snippet aids in calculating the back angle. We use key points 11(left shoulder), 23(left hip) and 27(left ankle) to draw a straight line between these points to calculate the back angle.

Note: Refer to the Mediapipe Keypoint Detection image to get information on all the keypoints.

eang1 = ang((idx_to_coordinates[11], idx_to_coordinates[23]), (idx_to_coordinates[23], idx_to_coordinates[27])) returns the calculated angle which is passed onto the next step of the pipeline.

try:

    if eang1 > 170:
        if plankTimer == None:
            plankTimer = time.time()
        plankDuration += time.time() - plankTimer
        plankTimer = time.time()
    else:
        plankTimer = None
    bar = np.interp(eang1, (120, 170), (850, 300))
    per = np.interp(eang1, (120, 170), (0, 100))
    cv2.rectangle(image, (200, 300), (260, 850), (0, 255, 0))
    cv2.rectangle(image, (200, int(bar)), (260, 850), (0, 255, 0), cv2.FILLED)
    cv2.putText(image, f'{int(per)} %', (200, 255), fontFace=cv2.FONT_HERSHEY_SIMPLEX,
                fontScale=1.1, color=(0, 255, 0), thickness=4)

except:
    pass

Now, we check for the back angle to be greater than 170°. If yes, the plankTimer variable starts to track the time and adds it to plankDuration

If the user is unable to maintain the correct posture, the timer will stop, indicating that they need to correct their posture.

Features 🦄

Realtime & Pre-recorded Inputs: The application supports both a real-time webcam/camera input as well as pre-recorded workout videos. In real-time mode, users can engage with the application instantaneously by connecting their webcam or camera, enabling them to receive immediate feedback and track insights during their live workout sessions. This real-time feature is invaluable for individuals who prefer monitoring their form, counting reps, and receiving corrections on the fly. On the other hand, the application extends its utility to pre-recorded workout videos, allowing users to analyze and evaluate their exercise routines retrospectively. This feature caters to those wanting to review their past workouts, assess their performance over time, or receive feedback on previously recorded sessions.

Performance Optimization: In optimizing the performance of the MediaPipe library, we leverage image processing techniques to refine and streamline the data fed into the system. Despite MediaPipe’s inherent lightweight and fast nature, our customization involves narrowing the focus to extract only the skeleton key points essential for analyzing specific exercises. Take pushups as an example — instead of processing the entire set of key points, we selectively concentrate on key points associated with the wrists, elbows, shoulders, and ankles. We significantly boost the application's efficiency by tailoring the input to include only the relevant skeletal markers for each exercise. This targeted approach enhances the speed of keypoint analysis and ensures that the computational resources are allocated precisely where needed, contributing to a more optimized and responsive performance overall.

Posture Correction: GymLytics employs a sophisticated approach that combines keypoint detection, analysis, and limb angle assessment to offer a detailed understanding of each exercise. Taking the example of planks, GymLytics meticulously analyzes the key points corresponding to the shoulder, hip, and ankle. By examining the spatial relationships among these crucial points, GymLytics derives precise information about the participant’s body alignment during the exercise. Specifically, in planks, the application focuses on determining the optimal back angle to ensure the exercise is performed with the correct form. This multifaceted analysis gives users real-time feedback on their posture, allowing them to make immediate adjustments and achieve the ideal body position for each exercise.

Visual Feedback: The application employs an intuitive visual feedback system to enhance the user experience. Progress bars are displayed on either side of the screen, serving as a dynamic visual aid to track the progress of each repetition. These progress bars dynamically adjust as the user performs the exercise, offering a real-time representation of their completion status. Additionally, a timer or counter positioned near the user’s head provides a quick and accessible means to track the duration or count of the ongoing exercise. This strategic placement ensures that users can effortlessly monitor their performance without diverting their attention from the exercise.

Conclusion ✨

Embarking on the intersection of my passion for fitness and the possibilities of technology, GymLytics stands as my brainchild — an open-source project that harnesses the dynamic duo of computer vision and machine learning to revolutionize how we engage with our workouts. GymLytics aims to become a virtual fitness companion, automatically analyzing exercises, providing real-time feedback on posture correction, and keeping tabs on the completion of reps. The heart of GymLytics beats with a systematic pipeline, encompassing camera setup, keypoint detection, exercise classification, posture correction, and data logging. It’s not just a project; it’s a personal journey to bridge the gap between fitness and technology.

As GymLytics takes its place in the open-source landscape, it’s not just about fitness tracking; it’s a personal journey of combining my love for coding with my dedication to a healthy lifestyle. It embodies the belief that technology can enhance our well-being, making fitness not just a routine but a personalized and enriching experience.

References 🌟

Ivan Grishchenko and Valentin Bazarevsky, Research Engineers, Google Research: Mediapipe by Google
Motion tracking with MediaPipe
Sibling Repo (Abhinaav Singh & Naman Arora): https://github.com/namanarora42/DeepFit

Made with ❤️ and 🦙 by Akshay