Visualizing the pulsar population using graph theory

The $P\dot P$ diagram is a cornerstone of pulsar research. It is used in multiple ways for classifying the population, understanding evolutionary tracks, identifying issues in our theoretical reach, and more. However, we have been looking at the same plot for more than five decades. A fresh appraisal may be healthy. Is the $P\dot P$-diagram the most useful or complete way to visualize the pulsars we know? Here we pose a fresh look at the information we have on the pulsar population. First, we use principal components analysis over magnitudes depending on the intrinsic pulsar's timing properties (proxies to relevant physical pulsar features), to analyze whether the information contained by the pulsar's period and period derivative is enough to describe the variety of the pulsar population. Even when the variables of interest depend on $P$ and $\dot P$, we show that $P\dot P$ are not principal components. Thus, any distance ranking or visualization based only on $P$ and $\dot P$ is potentially misleading. Next, we define and compute a properly normalized distance to measure pulsar nearness, calculate the minimum spanning tree of the population, and discuss possible applications. The pulsar tree hosts information about pulsar similarities that go beyond $P$ and $\dot P$, and are thus naturally difficult to read from the $P\dot P$-diagram. We use this work to introduce the pulsar tree website http://www.pulsartree.ice.csic.es containing visualization tools and data to allow users to gather information in terms of MST and distance ranking.


INTRODUCTION
Ever since the discovery of the first pulsar (Hewish et al. 1969), the plot containing the period derivative versus the period of every pulsar found, or simply, the  -diagram, has been used as a way of summarizing our knowledge and guiding our research on the pulsar population.Classes of pulsars and possible links among them are referred to in this plot, as it is what we know about their possible evolutionary tracks along the pulsar's lifetime (see e.g., Enoto et al. (2019) for a review).However, we have been looking at the same plot for more than five decades.A fresh appraisal may be healthy.Is the  -diagram the most useful or complete way to visualize the pulsars we know?Does it introduce any unwarranted bias on what we consider to be similar pulsars?Here, we use principal components analysis (PCA, e.g., Pearson (1901); Shlens (2014)) to show that even when the variables considered to describe a pulsar would all depend on  and , the variance of the population is not contained in the variance of the latter quantities.Thus, we note that any nearness ranking or visualization based only on  and  is potentially misleading.Next we define and compute a properly normalized distance to measure nearness from one pulsar to another, calculate the minimum spanning tree (MST, e.g., Gower & Ross (1969)) of the pulsar population, and discuss possible applications.The pulsar tree hosts information about pulsar similarities that go beyond  and , ★ E-mail: crodriguez@ice.csic.es† E-mail: dtorres@ice.csic.es‡ E-mail: patruno@ice.csic.esand are thus naturally difficult to read from the  -diagram.We also introduce here an online tool encompassing all our results, as well as allowing a user to focus on user-defined problems.
The MST is a graph that connects points in a multi-dimensional space.Each point (or node) is linked to at least another by an edge, whose length is associated with a given distance.The edges of an MST are chosen so that the the sum of their lengths is minimal and all nodes are linked, implying no cycles are present (no paths starting and returning to the same node).Graph theory shows that as long as distances are distinct, the MST is unique.Its very definition intuitively implies that the MST is an optimization technique.In fact, it was widely used in engineering problems, starting from their original application developed by Borůvka in 1926, for the distribution of electricity in Moravia, see Nešetřil et al. (2001).Currently, MSTs are used from analyzing cognitive impairment (Simon et al. 2021) to risk in financial markets (Pozzi et al. 2013).Early usage in scientific problems include describing the interrelationship of species or genetics (see the work of Florik in the 1950's and Edwards in 1960's as commented in Hartigan (1981) and (Winther 2018), respectively), disciplines in which it is a widespread technique.In astronomy, it has been used for finding high-energy sources (Campana et al. 2013), establishing differences between cluster and field stars (Sánchez et al. 2018), detecting filaments (Pereyra et al. 2020), galaxy clustering (Barrow et al. 1985), and cosmology (Naidoo et al. 2020).It has also been used in the analysis of event samples in particle colliders (Lovelace Rainbolt & Schmitt 2017), or cosmic-rays (Harari et al. 2006).This non-exhaustive reference list is just an example of growing interest in MST use across different fields.Despite this interest, it has been barely been used in relation to pulsars.To our knowledge there is only one related publication (Maritz et al. 2016) using 11 handpicked objects.The aim was to show that an MST could distinguish binaries from isolated pulsars using the dispersion measure as distance.In this paper, we explore using the MST to provide a novel classification, alerting, and visualization tool for pulsars.

Variables definition
We consider pulsars listed in the ATNF Catalog (Manchester et al. 2005), including radio pulsars, X-ray and/or gamma-ray pulsars, and magnetars for which coherent pulsations have been detected.Accretion-powered pulsars such as, e.g., SAX J1808.4-3658, are not included in this table, however.The current number of pulsars listed in the catalog is 3282, of which 2509 have a known period and period derivative (larger than 0).From the latter, 2242 are isolated pulsars and 267 are pulsars residing in binary systems.All the methods considered in this work will be applied to this set as a whole, without making any distinction among the pulsars in it.For characterizing the pulsar population, and ultimately defining a 'distance' from one pulsar to another, we shall consider the following physical set of pulsar variables (see, e.g., Lorimer & Kramer (2012) for reference): • Spin period derivative: • Surface magnetic flux density (equator): • Magnetic field at the light cylinder: • Spin-down energy loss rate: • Characteristic age: • Goldreich-Julian charge density: Here the moment of inertia  was assumed as 10 45 g cm 2 , the radius of the star  was assumed as 10 km, and the inclination  between the magnetic and rotation axes as 90  .The remaining constants (, ) have the usual meaning.
The measurable quantities  and  are the leading magnitudes in this set of variables, from which all others are calculated using the rotating dipole model, as is usual for pulsar estimations.
The surface magnetic field and spin-down power are basic magnitudes critical to characterize the pulsars' energetics and their magnetospheres.In our set, they are complemented with others to incorporate the fact that dissimilar pulsars (e.g., millisecond and normal) can have similar magnetospheres.This is partly described by the magnetic field at the light cylinder, which is similar in both cases.The voltage gives the potential drop between the magnetic pole and the edge of the polar cap.It is thought here to represent the variety introduced by the electromagnetic configuration, for which another parameter of interest is the Goldreich-Julian charge density, ∝   /.Note that these magnitudes, being all functions of  and , can have a relationship between themselves, as just noted.
The mass and the radius of the neutron stars would ideally also be considered as part of the variables of interest in our study.However, this information is only available for a very small percentage of the sample.
Other variables of interest are those related to the birth properties of pulsars, such as the initial spin-down power, the initial magnetic field, or the spin-down timescale.However, these are not known for most pulsars in our sample.They all depend on the unknown (except for a few) pulsars' real age (for which the characteristic age   is only a proxy).Similarly, the braking index is measured for just a handful of pulsars (and in addition, it is known that it may vary significantly).Estimates from it using  would significantly reduce the sample.Finally, other measurable quantities exist that are unrelated to intrinsic properties (transverse velocities, DM, distances) and/or are known for a limited number of objects.Using luminosities and other properties at different frequencies (e.g., fluxes, pulse shapes, peak separation, etc) would also significantly cut the sample, would be affected by extrinsic conditions (absorption, distance), and/or would incorporate parameters that are difficult to compare for the population as a whole.These are left for analysis in future work, where particular sub-samples will be looked at in more detail taking into account different variables, focusing here only on intrinsic pulsar features to introduce the technique.

Treating variables
The values of the magnitudes considered may differ by several orders of magnitude for different pulsars.The distributions of the logarithm of these variables are shown in Fig. 1.The distributions are not normal (as is also the case for the set of original variables, without logarithm).Two clear populations of millisecond and normal pulsars appear separately in all plots, except the two related to the spin-down power and voltage.As is obvious from their centralization (mean and median) and dispersion (standard deviation and interquartile range (IQR)), the log-variables are orders of magnitude closer together.Given that the distributions are not normal, we use the robust scaler for normalizing the log variables, where the †-symbol represents that the quantity   has been normalized,  1 ,  2 and  3 represent the 1st quartile, median, and 3rd quartile of the distribution, respectively, and IQR is the interquartile range, ( 3 −  1 ).It can be seen that the distributions of the variables after being normalized have a median equal to zero and an IQR equal to one.Note that if we take the logarithm of the variables and then apply Eq. ( 1) to normalize it, relations between variables are more clearly uncovered.For instance,   and ΔΦ, lead to log  †  = log ΔΦ † , that is also visible in the corresponding distributions of Fig. 1.Considering both of them at once in defining the nearness of two given pulsars relates to the fact that the physical meaning represented by the two original magnitudes is different.

PRINCIPAL COMPONENTS ANALYSIS
PCA is especially suitable to isolate the main factors introducing variance in a population when the variables known for it are not independent (see Appendix 1).Since 6 of the variables taken to describe the intrinsic properties of the pulsar population are in fact computed from  and , one can intuitively conclude that 2 principal components (PCs) are needed to describe our population variance.However, the latter is not contained in the variance of  and .Said otherwise,  and  are not the two principal components needed.
Thinking only in terms of  and  to compare pulsars may thus be misleading, except in the extreme case where these values are simply the same.Fig. 2 shows the result of the PCA analysis.The two PCs are, where the † -quantities refer to the normalized ones as in Eq. ( 1), and the sub-index  stands to note that the normalization is applied to the logarithm of the variable.After some algebra (see Appendix 1) these latter equations can be reformulated as 2 = 14.182 − 2.931 log  + 1.105 log . (5) Note that no dag marking is herein needed, as we have absorbed the corresponding IQR and median of each variable into the coefficients and that the units of  and  are as in Eq. (3).The left panel of Fig. 3 shows the pulsar population in the log  − log  together with lines representing equal values of the principal components.The right panel of Fig. 3 shows the same pulsars but directly in the plane  1 ,  2 .Nearness in one plane has not the same meaning as it has in the other.To exemplify this, we plot a circle in the ,  diagram and see how the circle transforms to the  1 ,  2 plane via the equations above.This is shown in the second row of Fig. 3; the difference in relative distances can be up to a factor of 3 or more.This change of shape advances the idea that any sort of nearness ranking will be affected if considering the PCs, instead of  and  (further comments about this can be found in the Appendix).

MINIMUM SPANNING TREE OF THE PULSAR POPULATION
Appendix 2 introduces all concepts needed to understand and compute an MST, and we take this for granted in what follows.We define an Euclidean distance using the eight normalized (via Eq. 1) logarithm of the variables introduced above.Equivalently, after our analysis of the previous section, we can use just two variables according to the PCA analysis, that contains the whole population variance.Both choices end up producing the same MST (and thus the results from the analysis that follows from it are the same) but the latter is less demanding given the reduced dimensionality of the problem.With this Euclidean distance, we first obtain a complete, undirected and weighted graph  (, ) =  (2509, 3146286), with | | nodes and | | edges, where each edge is defined by a specific weight value.From that, we obtain the pulsar MST, which is shown in Fig. 4.

Branch analysis and pulsar classification in the MST
As shown in Fig. 5, mixing nodes from different branches of the MST produces a scattered distribution of variables.This is a generic behavior that happens for any mixing of the branches in the MST, and for any mixing of the nodes even within a single branch (see the panels in the three last rows of Fig. 5).If we read the MST in a disordered manner, nothing is learned from it.Instead, Fig. 6 shows that if we choose one of the branches at a time and run along with the nodes in it in an orderly manner, a smooth behavior of the variables naturally appears.Mixing branches of the pulsar tree is equivalent to grouping pulsars by their nearness in the  -diagram.This is visually shown in Fig. 7.The pulsar tree hosts information that is difficult to read from the  -diagram.

The MST as a descriptive tool
The ordering introduced by each of the branches is indicative that there is an internal physical grouping in the MST.This is shown in Fig. 8, where we also show the variation of the principal components  1 and  2 .These variations serve to visualize the physical properties of different pulsar classes, and understanding them may lead to physical connections among pulsars, or links in their evolution.To emphasize this, we shall observe how some known groups of pulsars locate in the MST.
The main tree trunk travels from young and energetic pulsars in the bottom1 to millisecond pulsars in the top (see panels -referred from left to right and top to bottom-1, 2, and 5 of Fig. 8).However, other than representing the overall separation of the binary pulsars from the rest of the sample, the variables used in this MST do not allow us to dig deeper into sub-population of binaries: they are heavily affected by the evolutionary processes that occur for MSPs during the accreting (recycling) phase.As an example, consider black-widows (BWs), redbacks (RBs), and transitional pulsars (tMSPs) (see Papitto & Bhattacharyya (2022) for reviews).Black Widows and redbacks are binary MSPs where the companion is a semi-degenerate star or a main sequence star, respectively.The tMSPs (of which only two tMSPs (of the three known) have measured values of  and ) are instead binary MSPs that switch between an accreting/high X-ray emission state and a radio pulsar state.No clear grouping appears for these binaries sub-samples (see Fig. 9), as the intrinsic pulsar in them are similar.In future work, we shall supplement intrinsic variables with other representing the companion, the orbital parameters, and the environment of binary pulsars to try to address these issues.
The branches departing from the main trunk cover deviations that are better represented by the variability in one or a few of the magnitudes considered.The extremes of each branch are thus extreme pulsars of the population in a particular way.
For instance, the longest rightwards branch (moving along it towards the rightmost node) groups pulsars with increasing age, period, and period derivative.Pulsars in this branch are not particularly energetic nor do they have large magnetic fields at the light cylinder.Instead, they have an increasing surface field, reaching up to extreme values.The second-longest rightwards branch has similar behavior, but is formed by less magnetized objects at their surface, and also less energetic, slower, and older.Not surprisingly, then, magnetars are located in both of these branches, as is also the only XDIN, J1856-3754, quoted in the ATNF catalog.We show this in more detail in Fig. 9. Interestingly, the XDIN and the magnetars do not share the same branch.Some of the low-field magnetars, since they are more energetic, less magnetized objects that have nevertheless shown magnetarflaring behavior, appear quite separate from the rest.In fact, the low-field magnetars depicted in the MST having  < 1 are J1846-0258 (Gavriil et al. 2008) and J1119-6127 (Archibald et al. 2016), and they appear in Fig. 9 in the lower leftmost branch, one almost on top of the other in this scale.This is a very different location from where J2301+5852, J1647-4552, J1822-1604 and J0418+5732, the other low-field magnetars, are.They are all located at the end of the branch going rightwards, above the magnetars.
The different branches at the bottom of the MST contain all energetic pulsars.They share the same values of light cylinder magnetic fields of the millisecond pulsars, at the other end of the MST, despite they are very different in almost every other aspect.The small branches in which the energetic pulsars divide at the bottom of the plot also separate them into those having a significantly higher value of different variables; like ,   ,   .Spin period, log P (s) Period derivative, log Ṗ (s s   2018)).The MST by itself cannot judge the reality of the association proposed, but emphasizes how distinct these two are with respect to the rest.The case of CTA 1 has been studied in detail as a possible PWN in the reverberation phase, and/or having a higher magnetization (Martín et al. 2016).Its peculiarity has been already noted from a physical standpoint, although it is less of an outlier in comparison to the rest of the PWNe population (both in the MST and in comparing PWNe models).
Another interesting example of the MST view on pulsars is to note where the Fermi-LAT detected gamma-ray pulsars that are part of the ATNF fall on it (see Abdo et al. (2013); Fermi-LAT Collaboration (2021)).Fig. 9 shows two panels to this effect, where the ranges they have for the light cylinder magnetic field and spin-down power are noted.The detected gamma-ray pulsars cluster at specific locations of the tree, most of it being empty of gamma-ray emission.This corresponds to the fact that gamma-ray pulsars have high values of light cylinder magnetic field,   > 100 G, with most having actually   > 10 3 G, and relatively high spin-down power.Some other magnitudes are less decisive, e.g., there are gamma-ray pulsars across the full range of    -values.Along the branches where the Fermi-LAT pulsars lie there might be a close sequence of detected and non-detected pulsars, despite the MST clearly showing the similarity of their intrinsic properties.This is a representation that extrinsic features such as distance, environment, or geometry play a role in Fermi-LAT detectability at an individual level.
The Fermi-LAT pulsar isolated in the central part of the MST is J2208+4056, the only one depicted with   < 100 G.This pulsar has been noted by Smith et al. (2019) as having a spin-down (∼ 8 × 10 32 erg s −1 ) about three times lower than the previously observed gammaray emission death-line.The outlier Fermi-LAT pulsar in the left part of the MST is J1231-5113 and has an even lower spin-down power (∼ 5×10 32 erg s −1 ).In comparison, the other magnitudes   , ,    are similar to the rest of the gamma-ray population.
We also investigate the position in the MST of the 40 rotating radio transients (RRATS) with known  and  appearing in the ATNF (Abhishek et al. 2022;Cui & McLaughlin 2022).RRATS are pulsars showing extreme radio variability, as most of them are discovered through their single, isolated pulses.Only one of the 40 RRATs considered is located in the main trunk.This pulsar is near the degree 3 node J1828-1336 that separates the main trunk in the two central branches.
The MST can also be used to analyze any other pulsar population, for instance, where do the pulsar's known glitches, or the intermittent and nulling pulsars, group.The online tool provided with this work promotes this kind of analysis.

The MST view of evolutionary tracks
While pulsars evolve, they change their timing parameters and move across the  -diagram.Evolutionary models are then constructed following the fully coupled evolution of temperature and magnetic field in neutron stars (e.g., see Viganò et al. (2013)).To simulate evolutionary tracks, we have created synthetic pulsars over the theoretical tracks of figure 10 of Viganò et al. (2013) and individually studied where would they fall in the MST should they be part of our sample.This is shown in Fig. 10 where the arrows show increasing age at a fixed initial magnetic field and the rounded cap point in the origin of the arrows shows possible birthplaces.We find, in agreement with what was already discussed in Fig. 6, tracks are not randomly found in the MST.There are two birth zones in it, at the bottom part where we find all energetic pulsars, and at the rightmost branch, where we find the magnetars.Then, for low initial magnetic field values, pulsars go on to die on the main trunk of the MST.When the field increases, the pulsars populate the middle branches.When the initial field is high enough to shift the birthplace from the energetic pulsar zone to the classical magnetar range, the evolution is mostly confined to the two rightmost branches in Fig. 10.In these branches then, we find pulsars evolving in the two directions (from the main trunk to the extreme and vice-versa) depending on their place of origin.

The MST as an alerting tool
In addition of providing a descriptive perspective, the MST might be used as an alerting tool for pulsars-of-interest.These are some examples taking into account the MST location and the distance ranking.Implied connections are not always obvious using (, ) only (see the discussion on the PCA and the distance ranking above and in the appendices).We use the web application provided with this work to note the following: • Based on the location of the energetic low-field magnetars J1846-0258, J1119-6127 at the bottom part of the MST, and due as well to its nearness ranking, other pulsars with essentially the same characteristics are noted, in particular, J1208-6238.It has been suggested as a possible low-field magnetar in the literature (Clark et al. (2016)) and is second (first) in the distance ranking of J1846-0258 (J1119-6127) after J1119-6127 (followed by J1846-0258).PSRs J1513-5908 (in the composite SNR MSH 15-52), J1640-4631, and J1930+1852 follow in the ranking of J1846-0258; and J1640+4631, J1614-5048, and J1513-5908 do so in the ranking of J1119-6127.
• Panel 5 of Fig. 8 shows that few locations of the MST showing   ∼ 100 G or beyond and no detected gamma-ray pulsars yet.These regions become of special interest for future searches.In particular, those near J1231-5113 (which is already detected) in the MST appear to be promising potential targets.A few neighboring pulsars to the outlier J1231-5113, at the end of this branch, also show a relatively large   with a similar range of spin-down power, and other variables, in comparison to Fermi-LAT pulsars.Likewise, PSR J1915+1616 and J2129+1210B at the end of the nearby branch are of interest.Again, note that both have   > 10 3 G, and  > 10 33 erg s −1 .
• The detection of the radio pulsar PSR J2208+4056 by the Fermi LAT in spite of its low spin-down power has been ascribed by Smith The middle and right panels show the behavior of the 8 normalized variables and of the 2 principal components when following an arbitrary mixing of the nodes in the highlighted part of the MST.Third, fourth and fifth rows: the same, but now arbitrarily mixing the nodes of only one branch.Note that for the purpose of plotting both  1 and  2 in the same scale the right panels of Fig. 5, and also Fig. 6, we have subtracted the corresponding mean -from the whole set-from each set of values.et al. ( 2019) to a possible case of favorable geometry.If this is the case, it may remain indeed isolated in the MST, where appears close to the main trunk.Its closest neighbours (J0532-6639, J0502+4654, and J1848-0123) call for attention in order to test this.
• Others pulsars-of-interest regarding their possible detection in gamma-rays maybe J1818-1607 and J1550-5418.These lie in the magnetar branch, where no other detected Fermi-LAT pulsar is located (Li et al. 2017).The latter authors established a Fermi-LAT integrated upper limit for J1550-5418 and attempted folding, finding no signal.However, these pulsars have similar properties to other pulsars already detected by Fermi-LAT (ordered by distance, J1208-6238 occupies the third position, followed by J1119-6217, both detected by Fermi-LAT).In the case of the pulsar J1550-5418, among its ten closest pulsars according to the calculation of the Euclidean distance, the pulsars J1734-3333, J1746-2850 and J1726-3530 are in the sixth, eighth and ninth positions, respectively.They are located  5, but here following the node sequence as appearing along the marked branch of the MST.The node sequence is here read from left to right in branches that appear mostly vertical, where they are read from top to bottom).
in the lower part of the MST, where there is a high density of Fermi-LAT pulsars, although they have not yet been detected in gamma-rays.The distance ranking of these two magnetars is uncommon to others: other magnetars in the branch have neither a Fermi-LAT pulsar nor other pulsars located in high-density areas of Fermi-LAT detections in the first positions of their distance ranking.
• The pulsars J1842-0905 and J1457-5902, and J1413-6141 and J1907+0631 are the closest neighbours to the pulsar wind nebulae J1745-3040/PWN B1742-30(1) and J0007+7303/PWN CTA 1, respectively.The latter are somewhat outliers of the pulsar wind nebulae population (see above) and thus their neighbours are of interest to test whether this region of the pulsar parameter space is prone to producing observable nebulae.
• Finally, the higher-degree nodes -particularly those connected with the main trunk, the nodes that are extremes of significant branches, and possibly other topologically selected nodes, are suggested for individual study, as they may have undiscovered relevance.
our MST, is only based on intrinsic pulsar properties.Thus, it may well be the case (as it so happens) that one of the noticed pulsars above is in the LMC (J0532-6639): in such occasions the MST may just be pointing that intrinsically this pulsar may emit similarly to others detected, despite it maybe too far to be seen with current instrumentation.

DATA AVAILABILITY AND THE PULSAR TREE WEB
The pulsar tree web2 accompanying this paper contains visualization tools and data to produce all plots and go beyond what has been presented in this paper.It allows the readers to gather information in terms of MST localization,   comparison, and distance ranking.Among the functionalities included already, it can locate a given pulsar of the sample in the MST,  ,  1  2 -diagrams and on any other diagram using the variables adopted in this paper; identify all properties of the given pulsar and all neighbouring nodes both in the MST; zoom around a given portion of the MST (or on any of the other diagrams); obtain tables of the properties of the nodes in the region of interest; obtain tables of the distance ranking for any pulsar, and more.As our research continues, we expect to upgrade the pulsar tree web with new functionalities.

DISCUSSION
We have looked at the pulsar population from a different perspective from the usual   diagram.Instead of considering just  and  we used a set of 8 variables as proxies for intrinsic physical properties of all pulsars.While these variables all depend on  and , the variance of the population is not fully contained by the variance of the latter quantities. and  are not the principal components.Distance ranking (or visualizations) based only on  and  may hide interesting connections and mislead our intuition.We subsequently computed the pulsar MST, using a properly normalized Euclidean distance, discussed its properties and how the different classes of pulsars find an ordered place into it.

APPENDIX 1: PRINCIPAL COMPONENT ANALYSIS
The principal component analysis (PCA) is a technique used to reduce the number of variables of a problem without having a significant loss of information, see, e.g., Shlens (2014) for a review.PCA is based on the search for existing correlations between the original variables involved, so as to find a new system to represent the data.The axes of this new system will be vectors that are linearly independent of each other, i.e. geometrically orthogonal, and oriented in the direction where they are able to encompass the largest possible variance of the processed data.This linear transformation is based on the eigenvector decomposition of the covariance matrix  of the variables involved, where  is the data set in a matrix form, taking into account that  ∈  × and  ∈   × ,  the size of the dataset,  is the number of variables describing each member of the dataset, and X is the mean value of the variable.Furthermore,  will by definition be a symmetric and positive semi-definite matrix which contains on its main diagonal the variance of the variables used.The eigenvector decomposition can then be done as, where the eigenvectors ì  found from the matrix  together with their eigenvalues , allow one to perform the above transformation from the original set of variables to the set of principal components (hereafter PCs).In this way, the eigenvectors ì  are the new axes on which the new dimensional space will be built, which have the intrinsic property of maximizing the variance of the treated data.Each ì  will have an associated  according to Eq. ( 7).The eigenvalues serve to measure in which direction the data is most dispersed.Thus, it is useful to use the ratio of each  to the sum total of all eigenvalues to define what is known as explained variance.Said otherwise, the explained variance ratio is the percentage of variance that is associated with each of the PCs.The larger  is, the greater the explained variance covered by the resulting ì  will be.The cumulative explained variance is the sum of each explained variance.The new space may at most be −dimensional, when all PCs have a non-null contribution to the variance of the population.This final set of PCs will be obtained by relating the original set of variables to the ì  that have been defined.This linear transformation follows where   is the transpose matrix denoted in Eq. ( 6).In this way, as ì  has dimension (1 × ) and   has dimension ( × ), the result is a row vector of size (1 × ) that will contain the   values for each member of the space.
Algebra with  1 and  2 The variables used depend on powers of  and , and therefore once the logarithm is taken, the expressions defining each variable are linear.For example, log   =  + log  + log , where  is a constant (i.e. the equation of a plane,  +  +  +  = 0).The eigenvalues associated with the principal components beyond  1 and  2 will be strictly null and therefore all the explained variance is accumulated in the first two PCs.Eqs.
(2) and ( 3) are expressed as a function of the normalization of the logarithm of the variables, (  ) † where the sub-index  indicates the logarithm of the generic variable .Here, given that any of the variables we are considering can be written as  (, ) =      ,   = log  =   +   +    , the normalization is Once Eq. ( 9) is applied to all the variables, a summation will be obtained which leads to Eqs. ( 4) and ( 5).
We now consider Fig. 3, where we show how a circle in the  diagram would look in the principal components plane.The simulation of points located along a circle in the , -diagram, defined by its logarithmic coordinates (  ,   ), can be done by considering a fixed value  2 = Δ 2  + Δ  2  , where Δ is used to represent that the circle can be displaced from the origin.Introducing Eqs. ( 4) and ( 5) into it, we get: which corresponds to an ellipse (i.e.,  2 +  +  2 +  +  + = 0) with an angle of rotation cot(2) = (  − )/).Starting counterclockwise from the positive axis of  1 this leads to  = 73.38 • .
•  is minimally connected: there is only one path to connect any two vertices of , so by removing any edge  it would be disconnected.
•  is maximally acyclic: adding an edge to  would form a cycle.
The minimum spanning tree (MST) is a sub-graph of  such that the sum of all weights, () ∈ (), is the smallest possible.In this way, the MST connects all the nodes  taking into account the minimum distance between them at a local level, but under the condition that there is a global minimization for connecting the whole population of nodes.The solution of this kind of problem is obtain via greedy algorithms, which search the best possible solution in each iteration (Roughgarden 2019).Using  (, ), such an algorithm joins  ∪   (where   as the edge with the smallest possible weight contained in ), at every step, unless the addition of this causes a cycle (see e.g., Wilson (2010)).Here we use a special case of the greedy algorithm, called Kruskal's algorithm (see eg., (Kruskal 1956)) for getting an MST out of the complete .To implement this algorithm, the following steps must be carried out • Order the edges by increasing weight.
• Choose the edge with the smallest () and add it to  .
• Make sure that the chosen  does not produce any cycle in the structure of .
• The process finishes when all nodes  are connected, thus resulting in a graph  (,  ,  ), where | | = | | − 1 and  () is the weight of  according to Eq. ( 12) The description of this algorithm would follow the pseudocode shown (see eg., Erickson (2019)).This description is supported by the im-Algorithm Kruskal  (, , ) Require: sort  in increasing order according to () end if 11: end for 12: return  plementation of the algorithm based on a union data structure, which operates on disjoint subsets of , having the ability to support Make-Set, Find and Union operations.Note that MakeSet generates a subset for each node .On the other hand, the Find operation returns an identifier for the subset to which  belongs.Finally, Union decreases the number of subsets by merging those containing   and   .To exemplify the structures of  and  as well as the complexity of working with a high number of nodes, Fig. 11 shows a worked example of a complete, undirected and weighted graph  with 158 nodes and 12403 edges (i.e., 158 × 157/2), using as weights the values of the Euclidean distances between nodes according to Eq. ( 11).It can be seen that the number of edges is a quickly growing function of the number of nodes, and so are the computational time requirements.For example, the computational time to calculate the MST with 8 variables is about 288s.The right panel of Fig. 11 shows , the MST of , where we found only 157 edges in addition to the 158 nodes.

MST properties
Below we list properties associated with the MST (proofs can be found in graph theory books, e.g., Tarjan (1983); Kleinberg & Tardos (2005); Erickson (2019)).Considering that  (,  ) is the MST of  (, ), the main properties are: • Uniqueness: If all () of  () are distinct values, the resulting MST will be unique.This implies that the choice of any other greedy algorithm instead of Kuskal's as a solution to search for the MST (e.g.Prim's algorithm Prim (1957)) would give the same results.
• Cycle property: From the construction of  itself, any edge  ∈  such that  ∪ {} creates a cycle  in .On the other hand, if for some   ∈  it is determined that T(,  ∪ {} − {  }) is a spanning tree.Thus, if  is an MST then (  ) > (), otherwise, the edge with the largest weight contained in  will not be contained in  for  to remain an MST.Therefore, every node  ∈  has among its incident edges the  with the smallest value ().
• Cut property: From the construction of  itself, any edge  ∈  such that  (,  − {}) will cause a cut in  separating it into 2 connected components.For the resulting cut set, denoted as , it is observed that for any   ∈  such that  (,  − {} ∪ {  }) is a spanning tree.Thus, if  is an MST then (  ) > (), otherwise, the lightest edge contained in  must be contained in  for  to remain an MST.

Nearness in the context of different distances
Nearness between two pulsars depends on the definition of distance, and of the variables considered to compute it, thus the more complete this distance is, the better.One way to see this is by comparing the distributions of the weights associated to a complete, undirected and weighted graph using only ,  with those obtained with the full set of 8 magnitudes considered in the text (equivalently, their two principal components).This is shown in Fig. 12, and relates to the discussion of the bottom panels of Fig. 3.The distributions are different.In addition, just from the full set of weights () it is possible to know which would be the nearest neighbours of any randomly chosen pulsar.It can be seen that neighbouring pulsars differ according to each distance definition.Considering the nearest three neighbours for every pulsar using a distance based on ,  and comparing with the ones obtained using the full set of variables of interest, we find that 45% of the population incorporates a new pulsar even in the first three places of the ranking.The ranking positions, even when the same pulsars are concerned, may change in many cases: 40% of the population has at most one pulsar in the same position.

Visual nearness in the MST and ranking
Due to MST properties, we know that the first neighbour in the distance ranking from a given source is indeed one of the attached nodes in the MST.However, the second neighbour and beyond in the distance ranking do not necessarily reflect vis-a-vis the visual appearance of the MST.Thus, visual nearness from a given source in the MST is not a one-to-one overall translation of the distance ranking from that source.The MST does not minimize the distance to a given node only, but the total distance needed to visit the whole population.This introduces a global perspective in the selection of how particular sources are connected to the rest, which is what ultimately serves for clustering analysis techniques.

Representing the MST: how rendering is done
Given that an MST lacks a fixed set of axes, the rendering of the MST (angles among branches, orientation, branch direction) does not hold any particular meaning, only the connections among nodes do.A different representation could be chosen conserving the same properties (e.g., each node being linked to the same neighbours, being of the same order, preserving the same sequence of all variables along all branches, etc.).The nearness and connections described in one MST would be the same as in the other, despite the overall appearance differing.An obvious example is to take one of the branches that go rightwards in our MST and force it to go leftwards.This would produce exactly the same MST (the same adjacency matrix, in technical terms), and everything we have said looking to the MST with the same branch pointing rightwards would apply for the new rendering as well.It is not the appearance of the overall graph what is important, but the graph properties.In our representation we are using the program that uses the Graphviz python library (The graphviz team 2022), which searches the minimization of a pseudoenergy to select the orientation of the different components.We have verified using earlier versions of the ATNF catalog how the global appearance of the MST changes whenever a significant number of nodes are added (usually, the rendering does not change when adding a few nodes).Even when the visual appearance of the tree may differ, all similarity properties (e.g., the relative localization of members of sub-populations) and physical connections described are maintained.

Code implementation
Our code is built on python v 3.10.4(Van Rossum & Drake Jr 1995), in which we implement the Psrqpy package (Pitkin 2018) to deal with the population of pulsars from the ATNF catalog.To create the graphs we use the NetworkX (Hagberg et al. 2008) and the Graphviz (The graphviz team 2022) libraries.On the other hand, the SciPy library (Virtanen et al. 2020) contains Kruskal's algorithm according to which we have been able to calculate the MST.For the application of the PCA, we used the Scikit-learn library (Buitinck et al. 2013).The previous libraries and packages contain as requirements other well-known libraries such as NumPy (Harris et al. 2020), Pandas (McKinney et al. 2010) and Matplotlib (Hunter 2007) through which, in addition, we have been able to develop those parts of the code that were necessary to obtain the results seen.The app is done using Bokeh (The Bokeh team 2022).
This paper has been typeset from a T E X/L A T E X file prepared by the author.

Figure 1 .
Figure 1.Distribution of the logarithm of the 8 variables used for the complete set of 2509 pulsars.

Figure 2 .
Figure 2. PCA results using the logarithm of the set of variables for the whole population of 2509 pulsars.The left panel shows the explained variance contributed by each of the PCs according to the eigenvalues of the covariance matrix.It represents the amount of information contained in each PC.The central panel shows the cumulative variance explained by the new set of variables which has been defined through the PCA analysis.The right panel shows the 'weight'that each variable has with respect to each PC.This value is the coefficient that is held in each eigenvector.Negative values imply that the variable and the PC in question are negatively correlated.Conversely, a positive value shows a positive correlation between the PC and the variable.
Fig. 9 consistently shows how the pulsars associated with TeV confirmed or candidate pulsar wind nebulae (PWNe, H. E. S. S. Collaboration et al. (2018)) are essentially all located in these branches.Only three TeV PWNe, B1742-

Figure 3 .
Figure 3. Top-Left panel: Representation of the total pulsar population as a function of the  and  variables.Constant values of  1 and  2 are shown in green and brown lines, respectively, according to Eqs. (4) and (5).Top-right panel: Representation of the total pulsar population as a function of  1 and  2 whose values are obtained according to the 8 variables taken into account as shown in Eqs.(2) and (3).Constant values of  and  are shown in green and brown lines, respectively, taking into account the Eqs.(4) and (5).Bottom-left panel: Synthetic pulsars positioned circularly at different radii from a given centre.Bottom-right panel: Transformation of the circle through the Eqs.(4) and (5).

Figure 4 .
Figure 4.A different look at the pulsar population.MST-graph (2509MST-graph ( , 2508) )  based on the complete, undirected and weighted graph  (2509, 3146286) for the 2509 pulsars and their full combination of weights computed from their Euclidean distance among 8 normalized variables (or the equivalent 2 PCs).Each node in the MST represent a pulsar.Branches group pulsars with particular characteristics.

Figure 5 .
Figure5.Top two rows: In the left, nodes from different branches are highlighted in dark blue in the MST.The middle and right panels show the behavior of the 8 normalized variables and of the 2 principal components when following an arbitrary mixing of the nodes in the highlighted part of the MST.Third, fourth and fifth rows: the same, but now arbitrarily mixing the nodes of only one branch.Note that for the purpose of plotting both  1 and  2 in the same scale the right panels of Fig.5, and also Fig.6, we have subtracted the corresponding mean -from the whole set-from each set of values.

Figure 6 .
Figure6.Similar to the panels in Fig.5, but here following the node sequence as appearing along the marked branch of the MST.The node sequence is here read from left to right in branches that appear mostly vertical, where they are read from top to bottom).

Figure 8 .
Figure 8. Representation of the values of the different pulsar magnitudes considered directly onto the MST.The first row shows ,  and ; the second row shows the magnetic field the surface and at the light cylinder; the third row shows the spin-down power, the voltage and the Goldreich-Julian current, and finally the last row shows directly the principal components  1 and  2 values.

Figure 9 .
Figure 9. Representation of the members of different pulsar classes onto the (  1 ,  2 ) MST.The first row shows magnetars; redbacks, black widows and transitional pulsars, and pulsars associated with TeV pulsar wind nebulae.The first two panels second row shows the Fermi-LAT pulsars (thus the superscript F), further characterized against their spin-down power and light cylinder radius, as depicted in Fig. 8.The last panel shows the RRATs.

Figure 10 .
Figure 10.Representation of evolutionary tracks described in Viganò et al. (2013) into the MST, according to their initial magnetic field.