Perl Project: Siddhant Sanjeev 337/CO/11 Siddharth Saluja 338/CO/11

This document describes a Perl project to mine code from webpages related to a search query. It uses the Bing search API to retrieve URLs, then extracts C/C++ code snippets from the pages through regex pattern matching. The code is organized and output to an HTML file. A GUI allows users to enter queries and see results. Key aspects include using various Perl libraries, building the regex patterns, removing JavaScript, and outputting the filtered code snippets.

Uploaded by

sansid12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views14 pages

Perl Project: Siddhant Sanjeev 337/CO/11 Siddharth Saluja 338/CO/11

Uploaded by

sansid12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

PERL PROJECT

Submitted by:-
SIDDHANT SANJEEV
337/CO/11
SIDDHARTH SALUJA
338/CO/11

Aim
Data mining of the codes with respect to a topic.
To provide a faster access for the programmers to search
about getting a query.
To reduce the usage of google by inculcating our
application.

Technical procedure
Made a windows live account to use bing search api.
The api was fed with a query and all the relevant URLs
were fetched.
Data was Fetched in JSON format and url and description
of the webpage was extracted and stored.
Each of the urls was pinged and the webpage was
compared by regex to get all the c/c++ from that page.
Regex comprises of pattern matching with int, double,
void and long.
Regex also finds conditions like (if,else) and itereative
loops like for, while do-while etc.
The data was systematically stored in a html file called
out.html.
To make the i/o interactive we also used GUI using tk.pl:
library of perl through which inputs were taken in a text
box and user was timely notified about the completion of
the query.

Header Code
use LWP::UserAgent;
use Data::Dumper;
use JSON;
use Text::Balanced qw(extract_codeblock);
use Tk;
Header code explanation
The LWP::UserAgent is a class implementing a web user
agent. LWP::UserAgent objects can be used to dispatch
web requests.
Data: dumper is basically used to output text on the
screen.
JSON library is used to take the web data in JSON format.
Alternatively xml could also be used
Tk library is the graphics library of perl used to enter
various text boxes and dialog boxes in the code to make
the code more interactive

GUI code
$mw = MainWindow->new;
$mw->title("FINDER");
$frm_name = $mw->Frame();
$lab = $frm_name->Label( -text => "Enter Query:" );
$ent = $frm_name->Entry();
$but = $mw->Button( -text => "Search", -command =>
\&button_handler );
$textarea = $mw->Frame(); #Creating Another Frame
$txt = $textarea->Text( -width => 100, -height => 10 );
$srl_y = $textarea->Scrollbar( -orient => 'v', -command => [
yview => $txt ] );
$srl_x = $textarea->Scrollbar( -orient => 'h', -command => [
xview => $txt ] );
$txt->configure(
-yscrollcommand => [ 'set', $srl_y ],
-xscrollcommand => [ 'set', $srl_x ]
);
$lab->grid( -row => 1, -column => 1 );
$ent->grid( -row => 1, -column => 2 );
$frm_name->grid( -row => 1, -column => 1, -columnspan => 2 );
$but->grid( -row => 4, -column => 1, -columnspan => 2 );
MainLoop;

GUI code explanation
Main window-> new adds a canvas in which we can add
text boxes and dialog boxes.
$mw->title is used to assign title to the canvas.
$lab is the label that is used to take the input query.
$txt gets the input in the text format.
$but Is the button which when clicked invokes the button
handler function which calls the main function in it.

Button handler code
sub button_handler
{
$input = $ent->get();
$txt->insert( "end", "You searched for $input\n" );
$txt->update();

$accnt_key =
'hO+FNgghOI5lq3i5TILA4TFVKHBdtLsXZBFj67UaeMw';
$root =
'https://api.datamarket.azure.com/Bing/Search/v1/Web';
$query = $input;

$offset = 10;
$format = 'JSON';

$url = $root . build_args( $query, 10, $offset, $format );
$ua = LWP::UserAgent->new;
$req = HTTP::Request->new( GET => $url );
$req->authorization_basic( '', $accnt_key );
$response = $ua->request($req);
if ( !$response->is_success ) {
die 'Error connecting to BING API';
}
$json = $response->content;

$perl = from_json($json);
$next_url = $perl->{'d'}->{'__next'};
@results = @{ $perl->{'d'}->{'results'} };
open( $FH, '>', "out.html" )
or die 'Cannot open output file out.html';
print $FH <<ENDHTML;
<HTML>
<HEAD>
<TITLE>CodeFINDER</TITLE>
</HEAD>
<BODY>
<H1 align = "center"><u>RESULTS</u></H2>
ENDHTML

foreach $result (@results) {

code($result);
}
print $FH <<ENDHTML;
</body>
</html>
ENDHTML
close($FH);

$ans = $mw->messageBox(-title=>"done", -type=>"ok", -
message=>"completed.", -icon=>"info");
$txt -> delete('1.0','end');
}

Button Handler code
explanation
$accnt_key is the bing search api key.
$root is the base url that is used to access bing api.
$query is the input.
Function build_args is used to make the final url using
$accnt_key,$root,$query as input strings
sub build_args {
$q = '?Query=%27' . shift(@_) . '%27';
$c = '$top=' . shift(@_);
$o = '$skip=' . shift(@_);
$f = '$format=' . shift(@_);
return join( '&', $q, $c, $o, $f );
}
Top is used to show the particular record from the beginning of
the page.
Format is the format in which the data is extracted i.e. JSON
@result is an array of all the webpage links that the bing api
search returns.
Now we will access each of the link one by one using our
subroutine code ()

Subroutine Code
if ( $url2 =~ /\.(pdf|ppt|doc|docx)$/ )
{
next;
}
$url2 is the url extracted from array @result.
If the url contains the following types: pdf, ppt, doc or
docx then the link is skipped.
$disp_url = $result->{'DisplayUrl'};
$description = $result->{'Description'};
Contains the display url of the site
Each of the url is pinged to gather the data and matches it
to the regex correspondingly

Regex used
$func =
'(?:int|long|double|float|void|long\
double)\s?\*?\s+?\w{1,30}';
$grp1 = '(?:if|else\ if|for|while)';

$grp2 = '(?:class|struct|typedef\
struct)\s*?\w{1,30}\s*?';
$delim = '{}';
pos($page) = 0;
$regex1 = $grp1 . $reg1;
$regex2 = $grp2;
$regex3 = $func . $reg1;

$regex = join( '|', $regex1, $regex2, $regex3);
grp1 handles if else ,for, and while.
grp2 handles classes type definitions and structures.
Regex is the final matching pattern made using the
combination of grp1 and grp2.

Removing Javascript
while ( $page =~ s/<script.*?>.*?<\/script>//gsi ) { }
<script></script>
Tag is used in html to run various scripts.
Here we are using it to run a script that will remove all the data
between the script tags.

Conclusion
When the execution of our program terminates we can
see the output in out.html file.
Hence we have successfully separated the c/c++ codes
from the webpages .

Scripting Languages Advanced Perl: Course: 67557 Hebrew University Lecturer: Elliot Jaffe - הפי טוילא
100% (1)
Scripting Languages Advanced Perl: Course: 67557 Hebrew University Lecturer: Elliot Jaffe - הפי טוילא
44 pages
OReilly - Writing.apache Modules With Perl and C
No ratings yet
OReilly - Writing.apache Modules With Perl and C
741 pages
Web Chapter 5
No ratings yet
Web Chapter 5
77 pages
CSC3C03-Problem Solving Using C
100% (1)
CSC3C03-Problem Solving Using C
98 pages
Bot
No ratings yet
Bot
111 pages
WBP Lab Manual
No ratings yet
WBP Lab Manual
55 pages
againPHP LAB
No ratings yet
againPHP LAB
23 pages
Swamynaidu Sir (PHP Notes) PDF
100% (2)
Swamynaidu Sir (PHP Notes) PDF
314 pages
WT Lab File
No ratings yet
WT Lab File
94 pages
SL Unit 3
No ratings yet
SL Unit 3
24 pages
UI/UX Presentation5
No ratings yet
UI/UX Presentation5
64 pages
CGI Programming
No ratings yet
CGI Programming
25 pages
Open Search Server Client
No ratings yet
Open Search Server Client
52 pages
Web Client Programming With Perl
No ratings yet
Web Client Programming With Perl
257 pages
2.cgi, Perl, PHP
No ratings yet
2.cgi, Perl, PHP
6 pages
Unit 2 Server Side Scripting With DB Connection
No ratings yet
Unit 2 Server Side Scripting With DB Connection
57 pages
Mason - 1
No ratings yet
Mason - 1
35 pages
ST Lab Manual1
No ratings yet
ST Lab Manual1
75 pages
Theory Part Prgm1,2
No ratings yet
Theory Part Prgm1,2
13 pages
PHP Codes
No ratings yet
PHP Codes
23 pages
VTU Solution of 13MCA43 Advanced Web Programming June 2017 by Uma B
No ratings yet
VTU Solution of 13MCA43 Advanced Web Programming June 2017 by Uma B
19 pages
Web Programming Laboratory Manual
No ratings yet
Web Programming Laboratory Manual
67 pages
Large Project Testing
No ratings yet
Large Project Testing
38 pages
Owasp Webscarab: Uncovering The Hidden Treasures
No ratings yet
Owasp Webscarab: Uncovering The Hidden Treasures
32 pages
Principles of Web Design
100% (1)
Principles of Web Design
19 pages
PGP
No ratings yet
PGP
28 pages
Lab 8: Forms - Server-Side WEB1201: Web Fundamentals: Action
No ratings yet
Lab 8: Forms - Server-Side WEB1201: Web Fundamentals: Action
29 pages
Perl Language
No ratings yet
Perl Language
19 pages
Open Source Linux Apache Mysql PHP
No ratings yet
Open Source Linux Apache Mysql PHP
27 pages
Perl
No ratings yet
Perl
54 pages
Server-Side Scripting With PHP4
No ratings yet
Server-Side Scripting With PHP4
38 pages
Awesome One-Liner Bug Bounty
No ratings yet
Awesome One-Liner Bug Bounty
14 pages
Citectscada Technical Overview
No ratings yet
Citectscada Technical Overview
64 pages
Chapter 4 Iwt
No ratings yet
Chapter 4 Iwt
4 pages
Perl Interface
No ratings yet
Perl Interface
7 pages
JBoss Enterprise Application Platform-6-Administration and Configuration Guide-En-US
No ratings yet
JBoss Enterprise Application Platform-6-Administration and Configuration Guide-En-US
378 pages
How To Create A Simple Web Crawler in PHP
No ratings yet
How To Create A Simple Web Crawler in PHP
3 pages
CG Report Draft Final
No ratings yet
CG Report Draft Final
22 pages
Practical Perl: Web Automation
No ratings yet
Practical Perl: Web Automation
5 pages
Creating A Web Crawler in 3 Steps: Issac Goldstand Mirimar Networks
No ratings yet
Creating A Web Crawler in 3 Steps: Issac Goldstand Mirimar Networks
20 pages
9kw Tunnel Proxy
No ratings yet
9kw Tunnel Proxy
3 pages
WT 2
No ratings yet
WT 2
6 pages
ZKTeco Biometric Readers Product Catalogue FINAL LRZ 2023
No ratings yet
ZKTeco Biometric Readers Product Catalogue FINAL LRZ 2023
16 pages
Perl For WWW
No ratings yet
Perl For WWW
3 pages
Creating Web Applets With Java
No ratings yet
Creating Web Applets With Java
258 pages
CGI Programming UNIT 9
No ratings yet
CGI Programming UNIT 9
15 pages
Complete Source Code: Putting It All Together
No ratings yet
Complete Source Code: Putting It All Together
2 pages
Assignment B52
No ratings yet
Assignment B52
13 pages
Python F-String - Formatting Strings in Python With F-String
No ratings yet
Python F-String - Formatting Strings in Python With F-String
13 pages
Python Manual
No ratings yet
Python Manual
22 pages
Example: SQL Statements in COBOL and ILE COBOL Programs: Send Feedback Rate This Page
No ratings yet
Example: SQL Statements in COBOL and ILE COBOL Programs: Send Feedback Rate This Page
11 pages
Technology Tools For Collaborative Work
No ratings yet
Technology Tools For Collaborative Work
48 pages
PHP Shell
No ratings yet
PHP Shell
6 pages
Dorks With DonJuji
100% (1)
Dorks With DonJuji
4 pages
Operating System Important Questions
No ratings yet
Operating System Important Questions
3 pages
chapter1PHPTUT Nicephotog
No ratings yet
chapter1PHPTUT Nicephotog
1 page
Quick
No ratings yet
Quick
21 pages
Cadence NCVerilog Tutorial FL21
No ratings yet
Cadence NCVerilog Tutorial FL21
6 pages
Bcs2303-Web Scripting 21213
No ratings yet
Bcs2303-Web Scripting 21213
11 pages
CUBETEK Bluetooth Transmitter & Receiver User Man
No ratings yet
CUBETEK Bluetooth Transmitter & Receiver User Man
29 pages
Chapter 1 (Computer MCQS)
No ratings yet
Chapter 1 (Computer MCQS)
10 pages
Programming On The Web (Csc309F) : Website: Office-Hour: Friday 12:00-1:00 (Sf2110) Email: Wael@Cs - Toronto.Edu
No ratings yet
Programming On The Web (Csc309F) : Website: Office-Hour: Friday 12:00-1:00 (Sf2110) Email: Wael@Cs - Toronto.Edu
12 pages
Basic PHP Web Scraping Script Tutorial - Oooff
No ratings yet
Basic PHP Web Scraping Script Tutorial - Oooff
5 pages
Newshour App Doc (V5.0.4) : 25-12-22 Last Updated
No ratings yet
Newshour App Doc (V5.0.4) : 25-12-22 Last Updated
34 pages
What Is "CGI"?: - Passes Parameters To The Program
No ratings yet
What Is "CGI"?: - Passes Parameters To The Program
5 pages
Module 7 Lecture Notes: 635.482: Website Development
No ratings yet
Module 7 Lecture Notes: 635.482: Website Development
5 pages
A Program That Illustrates The Use of The Matching Operator
No ratings yet
A Program That Illustrates The Use of The Matching Operator
6 pages
Beginners Intro To Perl - Part 4: It's CGI Time
No ratings yet
Beginners Intro To Perl - Part 4: It's CGI Time
3 pages
Script To Calculate Your Sites Google SERP Position
No ratings yet
Script To Calculate Your Sites Google SERP Position
3 pages
Perl Syntax: Basic Script
No ratings yet
Perl Syntax: Basic Script
9 pages
Asus Teresa r1.1 Schematics
No ratings yet
Asus Teresa r1.1 Schematics
58 pages
HTML Dom Parser
No ratings yet
HTML Dom Parser
3 pages
CGI The Common Gateway Interface (CGI) Is A Method Used by Web Server To Run External
No ratings yet
CGI The Common Gateway Interface (CGI) Is A Method Used by Web Server To Run External
4 pages
OCI Storage Services
No ratings yet
OCI Storage Services
23 pages
chapter6PHPTUT Nicephotog
No ratings yet
chapter6PHPTUT Nicephotog
2 pages
Asus VW221D User Manual
No ratings yet
Asus VW221D User Manual
75 pages
Adafruit Bno055 Absolute Orientation Sensor
No ratings yet
Adafruit Bno055 Absolute Orientation Sensor
47 pages
Ascrngenpdf New
No ratings yet
Ascrngenpdf New
46 pages
String: Replace String Using Regexp Split To Lines Should Be String
No ratings yet
String: Replace String Using Regexp Split To Lines Should Be String
7 pages
Netapp CF
No ratings yet
Netapp CF
9 pages
Create Customized Table Vbak Sales Order (Header Data)
No ratings yet
Create Customized Table Vbak Sales Order (Header Data)
21 pages
How To Cross Compile A Binary For Idevices
No ratings yet
How To Cross Compile A Binary For Idevices
3 pages
Primavera P6 Exercise Worksheet
95% (19)
Primavera P6 Exercise Worksheet
35 pages
Unit 3 - Scripting
No ratings yet
Unit 3 - Scripting
16 pages
CAN 5F00 Comparison 5F00 TJA1050 5F00 HVD230
No ratings yet
CAN 5F00 Comparison 5F00 TJA1050 5F00 HVD230
13 pages
Trick 2-WPS Office
No ratings yet
Trick 2-WPS Office
3 pages
Accuratetiminganalysis
No ratings yet
Accuratetiminganalysis
6 pages

Perl Project: Siddhant Sanjeev 337/CO/11 Siddharth Saluja 338/CO/11

Uploaded by

Perl Project: Siddhant Sanjeev 337/CO/11 Siddharth Saluja 338/CO/11

Uploaded by

PERL PROJECT

You might also like