I have previously worked at Leena AI
as a full stack engineer. Outside of work, I'm
passionate about making a positive impact on the society. That led me to start Ralith Milith, an
anti-drug society in Kashmir.
"If you cannot do great things, do small things in a great way."
-Napolean Hill
Federated learning has emerged
as a
promising paradigm for collaborative machine learning,
enabling multiple clients to train a model while
preserving
data privacy jointly. Tailored federated learning takes
this
concept further by accommodating client heterogeneity
and
facilitating the learning of personalized models. While
the
utilization of transformers within federated learning
has
attracted significant interest, there remains a need to
investigate the effects of federated learning algorithms
on
the latest focal modulation-based transformers. In this
paper, we investigate this relationship and uncover the
detrimental effects of federated averaging (FedAvg)
algorithms on Focal Modulation, particularly in
scenarios
with heterogeneous data. To address this challenge, we
propose TransFed, a novel transformer-based federated
learning framework that not only aggregates model
parameters
but also learns tailored Focal Modulation for each
client.
Instead of employing a conventional customization
mechanism
that maintains client-specific focal modulation layers
locally, we introduce a learn-to-tailor approach that
fosters client collaboration, enhancing scalability and
adaptation in TransFed. Our method incorporates a hyper
network on the server, responsible for learning
personalized
projection matrices for the focal modulation layers.
This
enables the generation of client-specific keys, values,
and
queries. Furthermore, we provide an analysis of
adaptation
bounds for TransFed using the learn-to-customize
mechanism.
Through intensive experiments on datasets related to
pneumonia classification, we demonstrate that TransFed,
in
combination with the learn-to-tailor approach, achieves
superior performance in scenarios with non-IID data
distributions, surpassing existing methods. Overall,
TransFed paves the way for leveraging focal Modulation
in
federated learning, advancing the capabilities of focal
modulated transformer models in decentralized
environments.
Human
pose estimation is the
process of continuously monitoring a person's action and
movement to track and monitor the activity
of a person or an object. Human pose estimation is
usually
done by capturing the key points which describe the pose
of
a person. A guiding
practicing framework that enables people to learn and
exercise activities like yoga, fittness, dancing, etc.,
might
be built using human posture recognition remotely and
accurately without the help of a personal
trainer. This work has proposed a framework to detect
and
recognize
various yoga and exercise poses to help the individual
practice the same
correctly. A popular Blaze-pose model extracts key
points
from the student end and compares the same with the
instructor pose. The extracted
key points are fed to the Human Pose Juxtaposition model
(HPJT) to
compare the student pose with the instructor. The model
will
assess the
correctness of the pose by comparing the extracted key
points and give
feedback to students if any corrections need to be made.
The
proposed
model is trained with 40+ yoga and exercise poses, and
evaluated the
model's performance with the mAP, and the model achieved
an
accuracy
of 99.04%. The results proved that any person could use
the
proposed
framework in real-time to practice exercise, yoga,
dance,
etc. At their respective location without the help of a
physical instructor with precision
and accuracy, leading to a healthy life.
Federated learning has emerged
as a
promising paradigm for collaborative machine learning,
enabling multiple clients to train a model while
preserving
data privacy jointly. Tailored federated learning takes
this
concept further by accommodating client heterogeneity
and
facilitating the learning of personalized models. While
the
utilization of transformers within federated learning
has
attracted significant interest, there remains a need to
investigate the effects of federated learning algorithms
on
the latest focal modulation-based transformers. In this
paper, we investigate this relationship and uncover the
detrimental effects of federated averaging (FedAvg)
algorithms on Focal Modulation, particularly in
scenarios
with heterogeneous data. To address this challenge, we
propose TransFed, a novel transformer-based federated
learning framework that not only aggregates model
parameters
but also learns tailored Focal Modulation for each
client.
Instead of employing a conventional customization
mechanism
that maintains client-specific focal modulation layers
locally, we introduce a learn-to-tailor approach that
fosters client collaboration, enhancing scalability and
adaptation in TransFed. Our method incorporates a hyper
network on the server, responsible for learning
personalized
projection matrices for the focal modulation layers.
This
enables the generation of client-specific keys, values,
and
queries. Furthermore, we provide an analysis of
adaptation
bounds for TransFed using the learn-to-customize
mechanism.
Through intensive experiments on datasets related to
pneumonia classification, we demonstrate that TransFed,
in
combination with the learn-to-tailor approach, achieves
superior performance in scenarios with non-IID data
distributions, surpassing existing methods. Overall,
TransFed paves the way for leveraging focal Modulation
in
federated learning, advancing the capabilities of focal
modulated transformer models in decentralized
environments.
Human
pose estimation is the
process of continuously monitoring a person's action and
movement to track and monitor the activity
of a person or an object. Human pose estimation is
usually
done by capturing the key points which describe the pose
of
a person. A guiding
practicing framework that enables people to learn and
exercise activities like yoga, fittness, dancing, etc.,
might
be built using human posture recognition remotely and
accurately without the help of a personal
trainer. This work has proposed a framework to detect
and
recognize
various yoga and exercise poses to help the individual
practice the same
correctly. A popular Blaze-pose model extracts key
points
from the student end and compares the same with the
instructor pose. The extracted
key points are fed to the Human Pose Juxtaposition model
(HPJT) to
compare the student pose with the instructor. The model
will
assess the
correctness of the pose by comparing the extracted key
points and give
feedback to students if any corrections need to be made.
The
proposed
model is trained with 40+ yoga and exercise poses, and
evaluated the
model's performance with the mAP, and the model achieved
an
accuracy
of 99.04%. The results proved that any person could use
the
proposed
framework in real-time to practice exercise, yoga,
dance,
etc. At their respective location without the help of a
physical instructor with precision
and accuracy, leading to a healthy life.
One
of
the severe and pressing problems, which humanity is
facing
in the 21st Century is climate change. It is adversely
affecting the
entire globe, as is evident with the rising ocean
levels,
accelerated melting of glaciers, frequent storms, and
many
more. Accurate
weather forecasting is crucial in understanding and
mitigating the impacts of climate change. Cutting-edge
data-driven models
for weather forecasting typically employ recurrent or
convolutional neural networks, with certain models
integrating attention
mechanisms. This study introduces a novel approach, the
FocalNet Transformer-based Climate Change Temperature
Model
(CCTM),
specifically designed for temperature forecasting. The
CCTM
integrates tensorized modulation, enabling it to harness
the
spatial
and temporal intricacies of climate parameters data by
operating in a multi tensor format. Comparative
assessments
against existing
Encoder transformer architectures, 3D Conv Nets, LSTMs,
and
Conv LSTMs demonstrate the CCTM’s superior ability to
capture
nuanced patterns inherent in the data, particularly in
the
context of temperature prediction. We evaluated our
model on
our collected
data in Jammu and Kashmir region and reported state of
the
art results. In addition to that, the CCTM is assessed
on
two authentic
benchmark temperature datasets showing gain in accuracy
by
12% over existing models. To understand CCTM, two
modulation
scores
are introduced, derived from the tensorial modulation
process. These scores are analysed to illuminate the
decision-making process of
our model, offering valuable insights into the key
parameters influencing climate change
In
this
paper, we present a Climate Change Parameter Dataset
(CCPD)
intending to achieve state-of-the-art
results in parameters which effect climate change,
including
forest cover, water bodies, agriculture and
vegetation, population, temperature, construction, and
air
index. The dataset can be used by the research
community to validate the claims made in relation to the
climate change. Research community has been
deeply involved in extending the use case of machine
learning algorithms to the effects of climate change.
However, the non-availability of sufficient data related
to
climate change parameters has limited the
research in this domain. By presenting this dataset, we
want
to facilitate the researchers. In this dataset, we
provide a large variety of statistical and satellite
data
acquired by various image processing techniques and
on-ground data collection. The data is collected in
abundance for a specific region, and then various
machine learning techniques are used to extract the
useful
data related to each parameter separately. We
call this amalgam of processed data as CCPD dataset.
CCPD
dataset contains over 6000 data points for all
seven parameters and covers the data from 1960 onwards.
We
hope this dataset will aid the research
community in tackling climate change with the help of
AI.
The
area of Computer Vision has gone through exponential
growth and advancement over the past decade. It is
mainly due to the introduction of effective
deep-learning methodologies and the availability of
massive data. This has resulted in the incorporation of
intelligent computer vision schemes to automate the
different number of tasks. In this paper, we have worked
on similar lines. We have proposed an integrated system
for the development of robotic arms, considering the
current situation in fruit identification,
classification, counting, and generating their masks
through semantic segmentation. The current method of
manually doing these processes is time-consuming and is
not feasible for large fields. Due to this, multiple
works have been proposed to automate harvesting tasks to
minimize the overall overhead. However, there is a lack
of an integrated system that can automate all these
processes together. As a result, we are proposing one
such approach based on different machine learning
techniques. For each process, we propose to use the most
effective learning technique with computer vision
capability. Thus, proposing an integrated intelligent
end-to-end computer vision-based system to detect,
classify, count, and identify the apples. In this
system, we modified the YOLOv3 algorithm to detect and
count the apples effectively. The proposed scheme works
even under variable lighting conditions. The system was
trained and tested using a standard benchmark i.e.,
MinneApple. Experimental results show an average
accuracy of 91%..
Higer
Secondary Class X with 92.6%
Class
XII with
91.8%
Community
In my background story, I pivoted into academic lifesyle to pursure my curiousity to conduct
research in Computer Vision and Medical Imaging. I even shifted domains, starting from IoT
to robotics. I truly believe in open-source work in order to educate others better
(including myself) on how to critically access information better from the vast pool of
information available in the Internet. Credit must be given to all my academic mentors and
peers’ guidance, which have been influential at an early stage in my career to help overcome
this challenge.
I would like to continue the tradition and give back to the community, I am willing to mentor a
few early stage undergraduate/master students interested towards my direction in computer vision.
Personally, I encourage students with diverse backgrounds or in a similar nutshell as
mine to reach out.
If you are interested, Send out an introductory email to me about yourself, introduce me with
your academic background, preferably attach your CV. [Email]
I have built this website from the modified version of templates from here and here.
Feel free to use this template from GitHub [code], avoid scraping HTML here due to
unwanted
tags. If you are in deveoper, you can have a look at my older website [Template]