Thread: [Dev-C++] Choose a word from a .txt file
Open Source C & C++ IDE for Windows
Brought to you by:
claplace
From: Alexsandro M. <ale...@ho...> - 2008-06-03 04:24:23
|
Dear users, How can I write a script to look a word in a .txt file with two columns (one is the word and the other is the answer I need) and give an answer which is in the second column. In other words, a user types a word, then the script match it with a word in a .txt file (first column) and gives the answer (second column). Thanks in advance! Alex. _________________________________________________________________ Cansado de espaço para só 50 fotos? Conheça o Spaces, o site de relacionamentos com até 6,000 fotos! http://www.amigosdomessenger.com.br |
From: Lloyd <ll...@cd...> - 2008-06-03 04:47:10
|
Wouldn't regular expression search help you? May be better solution could be there... check for regex in www.boost.org On Tue, 2008-06-03 at 01:24 -0300, Alexsandro Meireles wrote: > Dear users, > > How can I write a script to look a word in a .txt file with two > columns (one is the word and the other is the answer I need) and give > an answer which is in the second column. In other words, a user types > a word, then the script match it with a word in a .txt file (first > column) and gives the answer (second column). > > Thanks in advance! > > Alex. > ______________________________________ Scanned and protected by Email scanner |
From: Per W. <pw...@ia...> - 2008-06-03 08:33:26
|
No regular expressions if he is looking for an exact match. Regular expressions are there to look for patterns. Normal strcmp()/strncmp() is faster when looking for exact matches. /pwm On Tue, 3 Jun 2008, Lloyd wrote: > Wouldn't regular expression search help you? May be better solution > could be there... > > check for regex in www.boost.org > > > On Tue, 2008-06-03 at 01:24 -0300, Alexsandro Meireles wrote: > > Dear users, > > > > How can I write a script to look a word in a .txt file with two > > columns (one is the word and the other is the answer I need) and give > > an answer which is in the second column. In other words, a user types > > a word, then the script match it with a word in a .txt file (first > > column) and gives the answer (second column). > > > > Thanks in advance! > > > > Alex. > > > > > > ______________________________________ > Scanned and protected by Email scanner > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Dev-cpp-users mailing list > Dev...@li... > TO UNSUBSCRIBE: http://www23.brinkster.com/noicys/devcpp/ub.htm > https://lists.sourceforge.net/lists/listinfo/dev-cpp-users > |
From: Münt, B. <Ber...@eu...> - 2008-06-03 09:35:37
|
> No regular expressions if he is looking for an exact match. Regular > expressions are there to look for patterns. Normal > strcmp()/strncmp() is > faster when looking for exact matches. No not at all. You explained an algorithms to look in the middle, then depending on less or greater in the lower half or in the upper half etc. That needs a lot of lookups if the file is huge. A regex with the multiline option can find the answer with one line. /^WordToFind\s*?(.*)/m In the caption(1) you will find the answer, if there is an answer. Regards, BM |
From: Per W. <pw...@ia...> - 2008-06-03 12:12:31
|
The question here isn't the number of source lines to perform the job. Why? Because you don't need any source lines at all if you just use normal unix-available tools - you can get an answer directly on the command line. But this is a C/C++ mailling list, since it is about Dev-C++. So the question should then be: How to actually perform the job (instead of letting existing command-line tools do it) in C/C++ source code. If you have a nail: should you use a hammer or a sledge hammer? Or maybe you should blast it into the pnak with dynamite? The Hilti nail-guns goes the explosive way, but with the disadvantage that it doesn't leave much for a DIY guy. Calling a magic reg-exp library doesn't tell someone what is happening. Reg-exp is cool, but regular expressions are meant to perform pattern matching. You don't have much need for any pattern matching in this case - and more importantly, you are ignoring how to solve the problem since you let someone else solve it for you. Next step: You compalin about my tree algorithm.I did suggest both a linear approach and a tree algorithm, mentioning the tree algorithm for large sorted files. If you have one billion words, the text file may be 100 billion bytes large. Guess how fast your reg-exp scan of the full file would be. Guess what would happen when a tree algorithm cuts it in two for each iterative step? A naive implementation would solve for one billion words in max 30 tests. A slightly better approach (switching from tree to linear when you have narrowed it down to a block of maybe a couple of kB of data - matched to n * block size for file system or memory subsystem. Now, assume 20MB/s for the regexp solution, and 0.1s/scan for the tree search. On average, the regexp would then need 25 000 seconds on average (twice in worst-case), while the tree search would take <= 3 seconds worst-case. Hence, the simplest code doesn't always win - you must know what your requirements are before you can select a good (or almost always a "good enough") algorithm. Anyway - please compare runtime and size of application binary for a compiled regexp solution, as compared with the following: void scan(char* my_word) { char tmp_match[MAX_WORD_LEN+10]; char linebuf[MAX_LINE_LEN],*p1,*p2; int match_len,found = 0; FILE *f; match_len = strlen(my_word); strcpy(tmp_match,my_word); strcpy(tmp_match+match_len++," "); f = fopen(fname,"rt"); if (f) { while (fgets(linebuf,sizeof(linebuf),f)) { if (!strncmp(linebuf,tmp_match,match_len)) { p1 = linebuf + match_len; while (*p1 == ' ') p1++; p2 = strchr(p1,'\n'); if (p2) *p2 = '\0'; printf("Answer for %s: %s\n",my_word,p1); found = 1; break; } } fclose(f); if (!found) printf("No match found for %s\n",my_word); } else { printf("Failed opening text file\n"); } } Note: A real implementation should call setbuffer() or setvbuf() or alternatively use ioctl() or similar and specify one-pass read-only scan without seek. But that isn't part of a search algorithm. /pwm On Tue, 3 Jun 2008, "Münt, Bernd" wrote: > > No regular expressions if he is looking for an exact match. Regular > > expressions are there to look for patterns. Normal > > strcmp()/strncmp() is > > faster when looking for exact matches. > > No not at all. You explained an algorithms to look in the middle, then depending on less or greater in the lower half or in the upper half etc. That needs a lot of lookups if the file is huge. > > A regex with the multiline option can find the answer with one line. > /^WordToFind\s*?(.*)/m > In the caption(1) you will find the answer, if there is an answer. > > Regards, BM > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Dev-cpp-users mailing list > Dev...@li... > TO UNSUBSCRIBE: http://www23.brinkster.com/noicys/devcpp/ub.htm > https://lists.sourceforge.net/lists/listinfo/dev-cpp-users > |
From: Pranav N. <pr...@pr...> - 2008-06-03 09:44:28
|
I think what Per refers to is the overhead of initializing and regex library and calling upon it. What appears as a single function call is probably several 100 lines of code underneath. And for small programs it's often better to leave out library dependencies and reduce compilation times. Just my 2 cents. Pranav Negandhi www.pranavnegandhi.com >> No regular expressions if he is looking for an exact match. Regular >> expressions are there to look for patterns. Normal >> strcmp()/strncmp() is >> faster when looking for exact matches. > > No not at all. You explained an algorithms to look in the middle, then > depending on less or greater in the lower half or in the upper half etc. > That needs a lot of lookups if the file is huge. > > A regex with the multiline option can find the answer with one line. > /^WordToFind\s*?(.*)/m > In the caption(1) you will find the answer, if there is an answer. > |
From: Per W. <pw...@ia...> - 2008-06-03 08:32:09
|
Why do you talk about scrit? This is the Dev-C++ list, where people normally uses Dev-C++ to write C or C++ applications. The traditional use of the word 'script' is for some interpretative programming - often in a batch file or similar. Anyway - is the file sorted? Does it contain a huge number of words? If no to any of the above two questions: Open file. Read one line into a buffer (for example with fgets()). Find the first separator character (space, tab or whatever the file is using). Replace the separator with a '\0'; Compare start of line with the word you want to match. If no match, load next line and repeat. If match, step past the separator and then eath any more separator characters. Locate the '\n' at end of the read line, and replace with a '\0'; Emit the extracted answer. If the file is huge and sorted, get the file size. Move to the middle of the file. Read one (possibly partial line). Read the next line. Extract the word and see if you are above/below the requested language. If you found the word - extract answer. If word too large, move back to one quarter of the file and repat. If word was too small, seek to 75% of file and repeat. All the time remember the <min,max> range that the expected word must be in. Note that <min> and <max> should represent the start position of text lines, (or the end-of-file) so when you specify <min> as seek offset, you do not have to thow away any potentially partial text lines - you know that it is a full line and can check that word immediately. /pwm On Tue, 3 Jun 2008, Alexsandro Meireles wrote: > Dear users, > > How can I write a script to look a word in a .txt file with two columns (one is the word and the other is the answer I need) and give an answer which is in the second column. In other words, a user types a word, then the script match it with a word in a .txt file (first column) and gives the answer (second column). > > Thanks in advance! > > Alex. > _________________________________________________________________ > Cansado de espaço para só 50 fotos? Conheça o Spaces, o site de relacionamentos com até 6,000 fotos! > http://www.amigosdomessenger.com.br |