The most common data structures in language models are tries and hash tables. You can take a look at Kenneth Heafield's paper on his own language model toolkit KenLM for more detailed information about the data structures used by his own software and related packages.

The likelihood function describes the probability of generating a set of training data given some parameters and can be used to find those parameters which generate the training data with maximum probability. You can create the likelihood function for a subset of the training data, but that wouldn't be represent...