Tag: Text file

  • Mapping Words to Line Numbers in Text Files in STL / C++

    Following on from the previous post, this example shows an example of how to use an STL multimap to track the line number(s) associated with each word in a text file.

    This program essentially reads in text line-by-line, while stripping out all occurrences of punctuation and other non-alphanumeric charcters. Each pair is inserted into the multimap container using the insert function.

    As with the previous posting, which deals with counting the frequency of words in a file, this example also uses the sample Hamlet.txt file.

    Code listing as follows:

    #include <iostream>
    #include <sstream>
    #include <fstream>
    #include <map>
    
    using namespace std;
    
    int main()
    {
        const string path = "/home/andy/NetBeansProjects/Hamlet.txt"; //Linux
        //const string path = "C:\\Dump\\Hamlet.txt";   
        ifstream input( path.c_str() );
    
    	if ( !input )
    	{
    		cout << "Error opening file." << endl;
    		return 0;
    	}
    
    	multimap< string, int, less<string> >  words;
    	int line;
    	string word;
    
    	// For each line of text
    	for ( line = 1; input; line++ )
    	{
    		char buf[ 255 ];
    		input.getline( buf, 128 );
    
    		// Discard all punctuation characters, leaving only words
    		for ( char *p = buf;
    			  *p != '\0';
    			  p++ )
    		{
    			if ( !isalpha( *p ) )
    				*p = ' ';
    		}
    
    		istringstream i( buf );
    
    		while ( i )
    		{
    			i >> word;
    			if ( word != "" )
    			{
    				words.insert( pair<const string,int>( word, line ) );
    			}
    		}					
    	}
    
    	input.close();
    
    	// Output results
    	multimap< string, int, less<string> >::iterator it1;
    	multimap< string, int, less<string> >::iterator it2;
    
    	for ( it1 = words.begin(); it1 != words.end(); )
    	{
    		it2 = words.upper_bound( (*it1).first );
    
    		cout << (*it1).first << " : ";
    
    		for ( ; it1 != it2; it1++ )
    		{
    			cout << (*it1).second << " ";
    		}
    		cout << endl;
    	}
    	
    	return 0;
    }
    

    Giving the following output. Notice that multiple occurrences of words per line are mapped.

    Related post: Counting the Number of Words in a Text File in STL / C++

  • Getting Started with Python

    Reading and analyzing data from a text file:

    My gentle introduction to writing a Python script.  A script I needed to read in a log file and extract all inkjet printhead temperature values in Celcius recorded for the duration of the print run. Very simple. (more…)