Particle tracking detectors allow studying of elementary particle interactions and precise measurement of their properties by observing their tracks. Robust tracking algorithms is nowadays a fundamental component of all tracking detectors. Conventional track data reconstruction is performed in three steps. First, 3D tomographic images of the emulsion detector are acquired using automated scanning microscopes. Next, the position of silver grains (“hits”) is located in the 3D image volumes, and finally tracks in the detector volume are reconstructed as sequence of hits.Conventional track data reconstruction is performed in three steps. First, 3D tomographic images of the emulsion detector are acquired using automated scanning microscopes. Next, the position of silver grains (“hits”) is located in the 3D image volumes, and finally tracks in the detector volume are reconstructed as sequence of hits. Several tracking algorithms were developed over the course of evolution of the scanning systems, allowing for efficient track reconstruction in real-time during the acquisition. While satisfying need of many experiments they have several drawbacks. Adaptation to different experimental condition, e.g. high track density or high background level requires tedious calibration ranging from extensive parameter tuning to performing dedicated test runs using e.g. accelerator beams. In addition, when the procedure of extracting the hits is separated from track reconstruction, the tracking algorithm cannot fully exploit the information available in the raw image data, compromising performance especially in the high background/track density cases. Incorporating tracking based on the classical Deep Learning, where the track parameters are predicted from the raw image data would naturally address the latter issue. Yet, to train such model in a supervised manner either one would need to provide massive amount of labeled 3D raw image data for each experimental case, or training would need to be performed largely on the simulated datasets. Training such models in an unsupervised manner, i.e. where no track parameters (labels) are to be provided during the training can address the mentioned issues simultaneously, by both leveraging raw image data for efficient track reconstruction and would allow simple adaptation to new configurations, requiring the raw image dataset only. Here we introduce a tracking approach based on the Deep Convolutional Autoencoder model that learns to disentangle the factors of variation in a geometrically meaningful way in a fully unsupervised manner by imposing equivariance of the space transformation. While the reconstruction constraint alone fails to disentangle the factors of variation in a meaningful way, we show that adding a simple constraint on translational invariance along the track line does not lead to an improvement. We demonstrate that incorporating more sophisticated transformations in the latent representation is demanded to avoid the reference ambiguity.