Abstract Pneumococcal conjugate vaccines have reduced the incidence of invasive pneumococcal disease,
caused by vaccine serotypes, but non-vaccine-serotypes remain a concern. We used whole genome sequencing
to study pneumococcal serotype, antibiotic resistance and invasiveness, in the context of genetic background.
Methods: Our dataset of 13,454 genomes, combined with four published genomic datasets, representedAfrica (40%),
Asia( 25%), Europe (19%), NorthAmerica (12%), and SouthAmerica (5%). These 20,027 pneumococcal genomes were
clustered into lineages using PopPUNK, and named Global Pneumococcal Sequence Clusters (GPSCs). From our
dataset ,we additionally derived serotype and  sequence type, and predicted antibiotic sensitivity. We then measured
invasiveness using odds ratios that relating prevalence in invasive pneumococcal disease to carriage.