Skip to content

Callmeboy #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 12, 2020
Merged
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

# production
/build
/spider/**

# misc
.DS_Store
Expand Down
19 changes: 17 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,33 @@

```
|-public 插件的 mainfest,html 等静态资源
|-src 源代码
|-spider 爬虫抓取和转化后的相关文件
|---raw-markdown 由github上抓到的题解markdown原文件
|---yield-db-json 从markdown中提取标题、tag、公司、各语言题解生产的json
|-src 源代码
|-scripts
|--- constants.js 脚本常量
|--- curl-leetcode.js 爬虫请求逻辑
|--- LeeCodeProvider.js 爬虫基类
|--- Logger.js 日志辅助类
|--- utils.js 正则、文件操作等辅助类
|---db 所有的题目信息, 标签信息, 公司信息
|---App.js 主逻辑都在这里
```

## 构建
## 爬虫
- npm run crawl 此命令会先从github上拉取问题列表,将文件名解析成数组,然后根据问题名称循 环拉取与之对应的markdown文件(此过程会先查找本地是否存在,如果存在则跳过)
问题拉取完成后,根据markdown匹配正则,转化成所需的json文件



## 构建
- npm run build
- 然后将 build 文件夹的内容添加到扩展中即可,具体方式见上面的`功能介绍`。

> 以后每次执行 npm run build, 插件会自动刷新,无需手动加载。


## 计划

- [ ] 完善题目,优先级比较高的是 91 的这些题目,按照现有的两个题目的标准进行完善。
Expand Down
15,099 changes: 15,099 additions & 0 deletions package-lock.json

Large diffs are not rendered by default.

13 changes: 12 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,20 @@
"@testing-library/react": "^9.3.2",
"@testing-library/user-event": "^7.1.2",
"antd": "^4.3.1",
"highlight.js": "^10.0.3",
"marked": "^1.1.0",
"react": "^16.13.1",
"react-dom": "^16.13.1",
"react-markdown": "^4.3.1",
"react-scripts": "3.4.1"
},
"scripts": {
"lint": "eslint src",
"start": "react-scripts start",
"build": "react-scripts build",
"test": "react-scripts test",
"eject": "react-scripts eject"
"eject": "react-scripts eject",
"crawl": "node scripts/curlLeetcode.js && node scripts/generateLeetcode.js"
},
"browserslist": {
"production": [
Expand All @@ -29,5 +33,12 @@
"last 1 firefox version",
"last 1 safari version"
]
},
"devDependencies": {
"axios": "^0.19.2",
"cheerio": "^1.0.0-rc.3",
"iconv-lite": "^0.5.1",
"log4js": "^6.3.0",
"mkdirp": "^1.0.4"
}
}
49 changes: 49 additions & 0 deletions scripts/LeetCodeProvider.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@


const request = require('request')
const Iconv = require('iconv-lite')
const cheerio = require('cheerio')

const Logger = require('./logger')
const Utils = require('./utils')
const { PROBLEMS_URL, QUESTION_DOM_SELECTOR, BASE_MARKDWON_DOWNLOAD_URL, ENGLISH_MARKDOWN_SIGN } = require('./constants')

module.exports = LeetCodeProvider = {



getProblemsTitle() {

return Utils.httpGet(PROBLEMS_URL)
.then((body)=> {
let titles = []
let sHtml = Iconv.decode(body, 'utf-8').toString()
cheerio.load(sHtml)(QUESTION_DOM_SELECTOR).each((idx, ele) => titles.push(ele.attribs['title']))
Logger.success('获取问题列表成功')

return titles.filter(name => !name.endsWith(ENGLISH_MARKDOWN_SIGN))
})
.catch(error => {
Logger.error('获取问题列表失败', error)
})
},




getProblemDetail(problemNameWithExt) {

return Utils.httpGet(`${BASE_MARKDWON_DOWNLOAD_URL}${problemNameWithExt}`)
.then(body => {

let markdown = Iconv.decode(body, 'utf-8').toString()
Logger.success(`抓取问题 "${problemNameWithExt}" 成功!`)
return markdown
})
.catch(error => {
Logger.error(`抓取问题 "${problemNameWithExt}" 失败`, error)
})
}

}

21 changes: 21 additions & 0 deletions scripts/Logger.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

const log4js = require("log4js");

const logger = log4js.getLogger()

logger.level = 'debug'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议根据不同环境切换日志级别


logger.category = 'LeetCode'

const Logger = {

success(...args) {
logger.info(...args)
},
error(...args) {
logger.error(...args)
}

}

module.exports = Logger
57 changes: 57 additions & 0 deletions scripts/constants.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@



module.exports = {

/**
* 需解析的语言类型
*/
SUPPORT_LANGUAGE: [
'java',
'js',
'cpp',
'py'
],

/**
* 是否启用强制更新
* 如开启,会跳过读取本地缓存,拉取最新文件
*/
IS_FORCE_UPDATE_MODE: true,

/**
* 请求处理频率 ms
*/
REQUEST_RATE: 300,

/**
* markdown输出目录
*/
RAW_MARKDOWN_OUTPUT_DIR: 'spider/raw-markdown',

/**
* 转化后的json输出目录
*/
DB_JSON_OUTPUT_DIR: 'spider/yield-db-json',

/**
* 获取问题列表地址
*/
PROBLEMS_URL: 'https://github.com/azl397985856/leetcode/tree/master/problems',

/**
* 抓取页面问题内容的dom元素选择器
*/
QUESTION_DOM_SELECTOR: '.js-navigation-item .content .js-navigation-open',

/**
* markdwon下载地址
*/
BASE_MARKDWON_DOWNLOAD_URL: 'https://raw.githubusercontent.com/azl397985856/leetcode/master/problems/',

/**
* 过滤英文文档末尾标识
*/
ENGLISH_MARKDOWN_SIGN: '.en.md'

}
68 changes: 68 additions & 0 deletions scripts/curlleetcode.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
const LeetCodeProvider = require('./leetcodeprovider')

const Logger = require('./logger')

const Utils = require('./utils')


const { RAW_MARKDOWN_OUTPUT_DIR, REQUEST_RATE, IS_FORCE_UPDATE_MODE } = require('./constants')


/**
* 当前请求问题索引
*/
let requsetNumber = 0


Utils.mkdirSync(RAW_MARKDOWN_OUTPUT_DIR)

const getProblemDetail = (questionsName, requsetNumber) => {

const cachedFilesName = Utils.getDirsFileNameSync(RAW_MARKDOWN_OUTPUT_DIR)

if (!IS_FORCE_UPDATE_MODE && cachedFilesName.includes(questionsName[requsetNumber])) {

Logger.success(`${questionsName[requsetNumber]}命中缓存, 跳过。。。`)

requsetNumber++


getProblemDetail(questionsName, requsetNumber)

}
else {

questionsName[requsetNumber] && LeetCodeProvider.getProblemDetail(questionsName[requsetNumber]).then(markDown => {
if (markDown) {

Logger.success(`问题: "${questionsName[requsetNumber]}" | 结果: ${JSON.stringify(markDown)}`)

Utils.writeFileSync(RAW_MARKDOWN_OUTPUT_DIR, questionsName[requsetNumber], markDown)

requsetNumber++
} else {
Logger.error(`获取${questionsName[requsetNumber]} markdown 失败!`)
}

}).catch(Logger.error).then(() => {

setTimeout(() => {

questionsName[requsetNumber] && getProblemDetail(questionsName, requsetNumber)

}, REQUEST_RATE)
})
}

}


LeetCodeProvider.getProblemsTitle().then(questionsName => {

getProblemDetail(questionsName, requsetNumber)

})




Loading