A11 Merged
A11 Merged
February 7, 2024
import nltk
[35]: nltk.download('punkt')
nltk.download('wordnet')
[35]: True
the PLA Rocket Force national defense science and technology experts panel,␣
↪according to a report published by the
serve as members of the PLA Rocket Force think tank, which will conduct␣
↪research into fields like overall design of
the missiles, missile launching and network system technology for five years.
The experts will enjoy the same treatment as their counterparts from␣
↪State-owned firms, the report said.
The PLA Daily said that this marks a new development in deepening␣
↪military-civilian integration in China, which
1
could make science and technology innovation better contribute to the␣
↪enhancement of the force’s combat capabilities.
"""
1 whitespace tokenization
[11]: whitespace_token = text.split()
[14]: print(whitespace_token)
2 Punctuation tokenization
[15]: punc_token = wordpunct_tokenize(text)
[16]: print(punc_token)
['The', 'think', 'tank', 'of', 'China', '’', 's', 'People', '’', 's',
'Liberation', 'Army', 'Rocket', 'Force', 'recently', 'recruited', '13',
'Chinese', 'technicians', 'from', 'private', 'companies', ',', 'PLA', 'Daily',
'reported', 'on', 'Saturday', '.', 'Zhang', 'Hao', 'and', '12', 'other',
'science', 'and', 'technology', 'experts', 'received', 'letters', 'of',
'appointment', 'at', 'the', 'founding', 'ceremony', 'of', 'the', 'PLA',
'Rocket', 'Force', 'national', 'defense', 'science', 'and', 'technology',
'experts', 'panel', ',', 'according', 'to', 'a', 'report', 'published', 'by',
'the', 'PLA', 'Daily', 'on', 'Saturday', '.', 'Honored', 'as', '“', 'rocket',
'force', 'science', 'and', 'technology', 'experts', ',”', 'Zhang', 'and', 'his',
2
'fellow', 'experts', 'from', 'private', 'companies', 'will', 'serve', 'as',
'members', 'of', 'the', 'PLA', 'Rocket', 'Force', 'think', 'tank', ',', 'which',
'will', 'conduct', 'research', 'into', 'fields', 'like', 'overall', 'design',
'of', 'the', 'missiles', ',', 'missile', 'launching', 'and', 'network',
'system', 'technology', 'for', 'five', 'years', '.', 'The', 'experts', 'will',
'enjoy', 'the', 'same', 'treatment', 'as', 'their', 'counterparts', 'from',
'State', '-', 'owned', 'firms', ',', 'the', 'report', 'said', '.', 'The', 'PLA',
'Daily', 'said', 'that', 'this', 'marks', 'a', 'new', 'development', 'in',
'deepening', 'military', '-', 'civilian', 'integration', 'in', 'China', ',',
'which', 'could', 'make', 'science', 'and', 'technology', 'innovation',
'better', 'contribute', 'to', 'the', 'enhancement', 'of', 'the', 'force', '’',
's', 'combat', 'capabilities', '.']
3 Treebank tokenization
[18]: tokenizer = TreebankWordTokenizer()
[20]: print(tbank_token)
3
4 TweetTokenizer
[23]: tokenizer = TweetTokenizer()
tweet_token = tokenizer.tokenize(text)
print(tweet_token)
['The', 'think', 'tank', 'of', 'China', '’', 's', 'People', '’', 's',
'Liberation', 'Army', 'Rocket', 'Force', 'recently', 'recruited', '13',
'Chinese', 'technicians', 'from', 'private', 'companies', ',', 'PLA', 'Daily',
'reported', 'on', 'Saturday', '.', 'Zhang', 'Hao', 'and', '12', 'other',
'science', 'and', 'technology', 'experts', 'received', 'letters', 'of',
'appointment', 'at', 'the', 'founding', 'ceremony', 'of', 'the', 'PLA',
'Rocket', 'Force', 'national', 'defense', 'science', 'and', 'technology',
'experts', 'panel', ',', 'according', 'to', 'a', 'report', 'published', 'by',
'the', 'PLA', 'Daily', 'on', 'Saturday', '.', 'Honored', 'as', '“', 'rocket',
'force', 'science', 'and', 'technology', 'experts', ',', '”', 'Zhang', 'and',
'his', 'fellow', 'experts', 'from', 'private', 'companies', 'will', 'serve',
'as', 'members', 'of', 'the', 'PLA', 'Rocket', 'Force', 'think', 'tank', ',',
'which', 'will', 'conduct', 'research', 'into', 'fields', 'like', 'overall',
'design', 'of', 'the', 'missiles', ',', 'missile', 'launching', 'and',
'network', 'system', 'technology', 'for', 'five', 'years', '.', 'The',
'experts', 'will', 'enjoy', 'the', 'same', 'treatment', 'as', 'their',
'counterparts', 'from', 'State-owned', 'firms', ',', 'the', 'report', 'said',
'.', 'The', 'PLA', 'Daily', 'said', 'that', 'this', 'marks', 'a', 'new',
'development', 'in', 'deepening', 'military-civilian', 'integration', 'in',
'China', ',', 'which', 'could', 'make', 'science', 'and', 'technology',
'innovation', 'better', 'contribute', 'to', 'the', 'enhancement', 'of', 'the',
'force', '’', 's', 'combat', 'capabilities', '.']
5 MWE
[24]: tokenizer = MWETokenizer()
[26]: print(mwe)
['The', 'think', 'tank', 'of', 'China', '’', 's', 'People', '’', 's',
'Liberation', 'Army', 'Rocket', 'Force', 'recently', 'recruited', '13',
'Chinese', 'technicians', 'from', 'private', 'companies', ',', 'PLA', 'Daily',
'reported', 'on', 'Saturday', '.', 'Zhang', 'Hao', 'and', '12', 'other',
'science', 'and', 'technology', 'experts', 'received', 'letters', 'of',
'appointment', 'at', 'the', 'founding', 'ceremony', 'of', 'the', 'PLA',
'Rocket', 'Force', 'national', 'defense', 'science', 'and', 'technology',
'experts', 'panel', ',', 'according', 'to', 'a', 'report', 'published', 'by',
'the', 'PLA', 'Daily', 'on', 'Saturday', '.', 'Honored', 'as', '“', 'rocket',
'force', 'science', 'and', 'technology', 'experts', ',', '”', 'Zhang', 'and',
'his', 'fellow', 'experts', 'from', 'private', 'companies', 'will', 'serve',
4
'as', 'members', 'of', 'the', 'PLA', 'Rocket', 'Force', 'think', 'tank', ',',
'which', 'will', 'conduct', 'research', 'into', 'fields', 'like', 'overall',
'design', 'of', 'the', 'missiles', ',', 'missile', 'launching', 'and',
'network', 'system', 'technology', 'for', 'five', 'years', '.', 'The',
'experts', 'will', 'enjoy', 'the', 'same', 'treatment', 'as', 'their',
'counterparts', 'from', 'State-owned', 'firms', ',', 'the', 'report', 'said',
'.', 'The', 'PLA', 'Daily', 'said', 'that', 'this', 'marks', 'a', 'new',
'development', 'in', 'deepening', 'military-civilian', 'integration', 'in',
'China', ',', 'which', 'could', 'make', 'science', 'and', 'technology',
'innovation', 'better', 'contribute', 'to', 'the', 'enhancement', 'of', 'the',
'force', '’', 's', 'combat', 'capabilities', '.']
6 port stemmer
[27]: from nltk.stem.porter import *
p_stemmer = PorterStemmer()
words = ['run','runner','running','ran','runs','easily','fairly']
for word in words:
print(word+' --> '+p_stemmer.stem(word))
7 snowball stemmer
[28]: from nltk.stem.snowball import SnowballStemmer
5
8 lemmatization
[36]: from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
words = ['run','runner','running','ran','runs','easily','fairly']
for word in words:
print(word+' --> '+lemmatizer.lemmatize(word))
[ ]:
6
Untitled
February 7, 2024
[43]:
[38]: nltk.download('punkt')
[38]: True
1
[31]: bow_df
[31]: 100 124 190 200 200sx 240sx all audi automatic benz … \
0 0 0 0 0 0 0 0 0 0 0 …
1 0 0 0 0 0 0 0 0 0 0 …
2 0 0 0 0 0 0 0 0 0 0 …
3 0 0 0 0 0 0 0 0 0 0 …
4 0 0 0 0 0 0 0 0 0 0 …
.. … … … … … … … … … … …
95 0 0 0 0 1 0 0 0 0 0 …
96 0 0 0 0 0 1 0 0 0 0 …
97 0 0 0 0 0 1 0 0 0 0 …
98 0 0 0 0 0 1 0 0 0 0 …
99 0 0 0 0 0 1 0 0 0 0 …
wagon wheel
0 0 1
1 0 1
2 0 1
3 0 1
4 0 1
.. … …
95 0 1
96 0 1
97 0 1
98 0 1
99 0 1
[19]: # TF IDF
2
tf_df = pd.DataFrame(t_matrix.toarray(), columns=tfidf_vec.
↪get_feature_names_out())
[33]: tf_df
[33]: 100 124 190 200 200sx 240sx all audi automatic benz … \
0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 …
1 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 …
2 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 …
3 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 …
4 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 …
.. … … … … … … … … … … …
95 0.0 0.0 0.0 0.0 0.504103 0.000000 0.0 0.0 0.0 0.0 …
96 0.0 0.0 0.0 0.0 0.000000 0.579066 0.0 0.0 0.0 0.0 …
97 0.0 0.0 0.0 0.0 0.000000 0.579066 0.0 0.0 0.0 0.0 …
98 0.0 0.0 0.0 0.0 0.000000 0.579066 0.0 0.0 0.0 0.0 …
99 0.0 0.0 0.0 0.0 0.000000 0.579066 0.0 0.0 0.0 0.0 …
3
[34]: # Word to Vec
[39]: m = pd.DataFrame()
[46]: w2vec_data
[46]: w2vec_embedding_1 \
0 [1.0144914, 0.09865342, -1.038593, -0.25373653…
1 [1.0144914, 0.09865342, -1.038593, -0.25373653…
2 [1.0144914, 0.09865342, -1.038593, -0.25373653…
3 [1.0144914, 0.09865342, -1.038593, -0.25373653…
4 [1.0144914, 0.09865342, -1.038593, -0.25373653…
… …
11909 [0.712496, 0.2578129, -0.4413481, -0.0887651, …
11910 [0.712496, 0.2578129, -0.4413481, -0.0887651, …
11911 [0.712496, 0.2578129, -0.4413481, -0.0887651, …
11912 [0.712496, 0.2578129, -0.4413481, -0.0887651, …
11913 [0.079442665, 0.18170413, -0.12031996, 0.13398…
w2vec_embedding_2 \
0 [0.102402635, 0.32382426, 0.42588675, 0.177568…
1 [0.102402635, 0.32382426, 0.42588675, 0.177568…
2 [0.102402635, 0.32382426, 0.42588675, 0.177568…
3 [0.102402635, 0.32382426, 0.42588675, 0.177568…
4 [0.102402635, 0.32382426, 0.42588675, 0.177568…
… …
11909 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11910 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11911 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11912 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11913 [0.102402635, 0.32382426, 0.42588675, 0.177568…
w2vec_embedding_3 \
0 [0.19682735, -0.02487174, -0.18721059, -0.0303…
4
1 [0.19682735, -0.02487174, -0.18721059,
-0.0303…
2 [0.19682735, -0.02487174, -0.18721059,
-0.0303…
3 [0.19682735, -0.02487174, -0.18721059,
-0.0303…
4 [0.19682735, -0.02487174, -0.18721059,
-0.0303…
… …
11909 [0.070702516, -0.0059914645, -0.05066969, 0.05…
11910 [0.070702516, -0.0059914645, -0.05066969, 0.05…
11911 [0.070702516, -0.0059914645, -0.05066969, 0.05…
11912 [0.070702516, -0.0059914645, -0.05066969, 0.05…
11913 [-0.003097159, -0.0026490043, -0.0028470934, -…
w2vec_embedding_4 \
0 [0.50986725, -0.010175074, -0.5086816, -0.2202…
1 [0.50986725, -0.010175074, -0.5086816, -0.2202…
2 [0.50986725, -0.010175074, -0.5086816, -0.2202…
3 [0.50986725, -0.010175074, -0.5086816, -0.2202…
4 [0.50986725, -0.010175074, -0.5086816, -0.2202…
… …
11909 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11910 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11911 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11912 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11913 [0.102402635, 0.32382426, 0.42588675, 0.177568…
w2vec_embedding_5 \
0 [0.13098931, -0.012665014, -0.1817777, 0.04231…
1 [0.102402635, 0.32382426, 0.42588675, 0.177568…
2 [0.102402635, 0.32382426, 0.42588675, 0.177568…
3 [0.102402635, 0.32382426, 0.42588675, 0.177568…
4 [0.102402635, 0.32382426, 0.42588675, 0.177568…
… …
11909 [-0.61892426, -0.26941118, 0.4358393, -0.12083…
11910 [-0.61892426, -0.26941118, 0.4358393, -0.12083…
11911 [-0.61892426, -0.26941118, 0.4358393, -0.12083…
11912 [-0.61892426, -0.26941118, 0.4358393, -0.12083…
11913 [-0.44647923, -0.71473974, 0.43583885, 0.07716…
w2vec_embedding_6 \
0 [0.102402635, 0.32382426, 0.42588675, 0.177568…
1 [-0.61892426, -0.26941118, 0.4358393, -0.12083…
2 [-0.61892426, -0.26941118, 0.4358393, -0.12083…
3 [-0.61892426, -0.26941118, 0.4358393, -0.12083…
4 [-0.61892426, -0.26941118, 0.4358393, -0.12083…
… …
11909 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
11910 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
11911 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
5
11912 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
11913 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
w2vec_embedding_7 \
0 [-0.61892426, -0.26941118, 0.4358393, -0.12083…
1 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
2 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
3 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
4 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
… …
11909 [-0.23426266, -0.8697558, -0.09931712, 1.08638…
11910 [-0.23426266, -0.8697558, -0.09931712, 1.08638…
11911 [-0.23426266, -0.8697558, -0.09931712, 1.08638…
11912 [-0.23426266, -0.8697558, -0.09931712, 1.08638…
11913 [0.102402635, 0.32382426, 0.42588675, 0.177568…
w2vec_embedding_8 \
0 [-0.44654804, -0.0011222507, 1.6705241, -0.013…
1 [-0.23426266, -0.8697558, -0.09931712, 1.08638…
2 [-0.23426266, -0.8697558, -0.09931712, 1.08638…
3 [-0.23426266, -0.8697558, -0.09931712, 1.08638…
4 [-0.23426266, -0.8697558, -0.09931712, 1.08638…
… …
11909 [0.74533176, 0.5301534, 0.63357615, 0.6397639,…
11910 [0.74533176, 0.5301534, 0.63357615, 0.6397639,…
11911 [0.74533176, 0.5301534, 0.63357615, 0.6397639,…
11912 [1.3811533, 0.13443133, 0.67313755, 0.33346632…
11913 [-0.18816459, -0.45196122, -0.21790302, 0.9887…
w2vec_embedding_9 \
0 [-0.23426266, -0.8697558, -0.09931712, 1.08638…
1 [0.74533176, 0.5301534, 0.63357615, 0.6397639,…
2 [0.74533176, 0.5301534, 0.63357615, 0.6397639,…
3 [0.74533176, 0.5301534, 0.63357615, 0.6397639,…
4 [0.74533176, 0.5301534, 0.63357615, 0.6397639,…
… …
11909 [0.93188065, -0.14711751, 1.1007845, -0.315147…
11910 [0.93188065, -0.14711751, 1.1007845, -0.315147…
11911 [0.93188065, -0.14711751, 1.1007845, -0.315147…
11912 [0.93188065, -0.14711751, 1.1007845, -0.315147…
11913 [0.102402635, 0.32382426, 0.42588675, 0.177568…
w2vec_embedding_10 \
0 [0.74533176, 0.5301534, 0.63357615, 0.6397639,…
1 [0.93188065, -0.14711751, 1.1007845, -0.315147…
2 [0.93188065, -0.14711751, 1.1007845, -0.315147…
3 [0.93188065, -0.14711751, 1.1007845, -0.315147…
6
4 [0.93188065, -0.14711751, 1.1007845, -0.315147…
… …
11909 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11910 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11911 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11912 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11913 [0.20108287, 0.6029379, 0.2330216, -0.3212626,…
w2vec_embedding_11 \
0 [0.93188065, -0.14711751, 1.1007845, -0.315147…
1 [0.102402635, 0.32382426, 0.42588675, 0.177568…
2 [0.102402635, 0.32382426, 0.42588675, 0.177568…
3 [0.102402635, 0.32382426, 0.42588675, 0.177568…
4 [0.102402635, 0.32382426, 0.42588675, 0.177568…
… …
11909 [-0.18816459, -0.45196122, -0.21790302, 0.9887…
11910 [-0.18816459, -0.45196122, -0.21790302, 0.9887…
11911 [-0.18816459, -0.45196122, -0.21790302, 0.9887…
11912 [-0.18816459, -0.45196122, -0.21790302, 0.9887…
11913 [-0.30040318, 0.5620006, 0.7801722, 0.9519743,…
w2vec_embedding_12 \
0 [0.102402635, 0.32382426, 0.42588675, 0.177568…
1 [-0.22948255, -0.4156468, -0.03284373, 0.07473…
2 [-0.22948255, -0.4156468, -0.03284373, 0.07473…
3 [-0.22948255, -0.4156468, -0.03284373, 0.07473…
4 [-0.22948255, -0.4156468, -0.03284373, 0.07473…
… …
11909 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11910 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11911 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11912 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11913 [-0.14079317, 0.5140112, 0.56032175, -0.192161…
w2vec_embedding_13 \
0 [-0.22948255, -0.4156468,
-0.03284373, 0.07473…
1 [0.102402635, 0.32382426,
0.42588675, 0.177568…
2 [0.102402635, 0.32382426,
0.42588675, 0.177568…
3 [0.102402635, 0.32382426,
0.42588675, 0.177568…
4 [0.102402635, 0.32382426,
0.42588675, 0.177568…
… …
11909 [0.23303074, 0.79225045, 0.5197343, 0.6270899,…
11910 [0.23303074, 0.79225045, 0.5197343, 0.6270899,…
11911 [0.23303074, 0.79225045, 0.5197343, 0.6270899,…
11912 [0.23303074, 0.79225045, 0.5197343, 0.6270899,…
11913 [0.102402635, 0.32382426, 0.42588675, 0.177568…
7
w2vec_embedding_14 \
0 [0.102402635, 0.32382426, 0.42588675, 0.177568…
1 [0.07896054, 0.3125886, 0.21870708, -0.3991353…
2 [0.07896054, 0.3125886, 0.21870708, -0.3991353…
3 [0.07896054, 0.3125886, 0.21870708, -0.3991353…
4 [0.07896054, 0.3125886, 0.21870708, -0.3991353…
… …
11909 [-0.30040318, 0.5620006, 0.7801722, 0.9519743,…
11910 [-0.30040318, 0.5620006, 0.7801722, 0.9519743,…
11911 [-0.30040318, 0.5620006, 0.7801722, 0.9519743,…
11912 [-0.30040318, 0.5620006, 0.7801722, 0.9519743,…
11913 [-0.6609878, 0.4256675, 1.4135256, 0.87129855,…
w2vec_embedding_15 \
0 [0.07896054, 0.3125886, 0.21870708, -0.3991353…
1 [-0.30040318, 0.5620006, 0.7801722, 0.9519743,…
2 [-0.30040318, 0.5620006, 0.7801722, 0.9519743,…
3 [-0.30040318, 0.5620006, 0.7801722, 0.9519743,…
4 [-0.30040318, 0.5620006, 0.7801722, 0.9519743,…
… …
11909 [-0.14079317, 0.5140112, 0.56032175, -0.192161…
11910 [-0.14079317, 0.5140112, 0.56032175, -0.192161…
11911 [-0.14079317, 0.5140112, 0.56032175, -0.192161…
11912 [-0.14079317, 0.5140112, 0.56032175, -0.192161…
11913 [0.102402635, 0.32382426, 0.42588675, 0.177568…
w2vec_embedding_16 \
0 [-0.30040318, 0.5620006,
0.7801722, 0.9519743,…
1 [-0.14079317, 0.5140112,
0.56032175, -0.192161…
2 [-0.14079317, 0.5140112,
0.56032175, -0.192161…
3 [-0.14079317, 0.5140112,
0.56032175, -0.192161…
4 [-0.14079317, 0.5140112,
0.56032175, -0.192161…
… …
11909 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11910 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11911 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11912 [0.102402635, 0.32382426, 0.42588675, 0.177568…
11913 [0.18458737, 0.88140917, 1.2577158, 0.6808571,…
w2vec_embedding_17
0 [-0.14079317, 0.5140112, 0.56032175, -0.192161…
1 [0.102402635, 0.32382426, 0.42588675, 0.177568…
2 [0.102402635, 0.32382426, 0.42588675, 0.177568…
3 [0.102402635, 0.32382426, 0.42588675, 0.177568…
4 [0.102402635, 0.32382426, 0.42588675, 0.177568…
… …
11909 [-0.62329394, -0.27842516, 1.6441574, 0.278253…
8
11910 [-0.62329394, -0.27842516, 1.6441574, 0.278253…
11911 [-0.62329394, -0.27842516, 1.6441574, 0.278253…
11912 [-0.62329394, -0.27842516, 1.6441574, 0.278253…
11913 [0.102402635, 0.32382426, 0.42588675, 0.177568…
[ ]:
9
Untitled
February 7, 2024
[26]: nltk.download('stopwords')
[26]: True
[19]: data.head()
tokens = word_tokenize(text)
stop_words = set(stopwords.words('english'))
tokens = [token for token in tokens if token not in stop_words]
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(token) for token in tokens]
cleaned_text = ' '.join(tokens)
1
return cleaned_text
label_encoder = LabelEncoder()
clean["encoded_label"] = label_encoder.fit_transform(data["Category"])
[29]: clean
encoded_label
0 0
1 0
2 0
3 0
4 0
… …
2220 4
2221 4
2222 4
2223 4
2224 4
[31]: combined_text
2
[31]: text
0 ad sale boost time warner profit quarterly pro…
1 dollar gain greenspan speech dollar hit highes…
2 yukos unit buyer face loan claim owner embattl…
3 high fuel price hit ba profit british airway b…
4 pernod takeover talk lift domecq share uk drin…
… …
2220 bt program beat dialler scam bt introducing tw…
2221 spam e mail tempt net shopper computer user ac…
2222 careful code new european directive could put …
2223 u cyber security chief resigns man making sure…
2224 losing online gaming online role playing game …
[32]: # unclean
[33]: data.Content[1]
[33]: 'Dollar gains on Greenspan speech\r\n\r\nThe dollar has hit its highest level
against the euro in almost three months after the Federal Reserve head said the
US trade deficit is set to stabilise.\r\n\r\nAnd Alan Greenspan highlighted the
US government\'s willingness to curb spending and rising household savings as
factors which may help to reduce it. In late trading in New York, the dollar
reached $1.2871 against the euro, from $1.2974 on Thursday. Market concerns
about the deficit has hit the greenback in recent months. On Friday, Federal
Reserve chairman Mr Greenspan\'s speech in London ahead of the meeting of G7
finance ministers sent the dollar higher after it had earlier tumbled on the
back of worse-than-expected US jobs data. "I think the chairman\'s taking a much
more sanguine view on the current account deficit than he\'s taken for some
time," said Robert Sinche, head of currency strategy at Bank of America in New
York. "He\'s taking a longer-term view, laying out a set of conditions under
which the current account deficit can improve this year and
next."\r\n\r\nWorries about the deficit concerns about China do, however,
remain. China\'s currency remains pegged to the dollar and the US currency\'s
sharp falls in recent months have therefore made Chinese export prices highly
competitive. But calls for a shift in Beijing\'s policy have fallen on deaf
ears, despite recent comments in a major Chinese newspaper that the "time is
ripe" for a loosening of the peg. The G7 meeting is thought unlikely to produce
any meaningful movement in Chinese policy. In the meantime, the US Federal
Reserve\'s decision on 2 February to boost interest rates by a quarter of a
point - the sixth such move in as many months - has opened up a differential
with European rates. The half-point window, some believe, could be enough to
keep US assets looking more attractive, and could help prop up the dollar. The
recent falls have partly been the result of big budget deficits, as well as the
US\'s yawning current account gap, both of which need to be funded by the buying
of US bonds and assets by foreign firms and governments. The White House will
3
announce its budget on Monday, and many commentators believe the deficit will
remain at close to half a trillion dollars.'
[34]: combined_text.text[1]
[34]: 'dollar gain greenspan speech dollar hit highest level euro almost three month
federal reserve head said u trade deficit set stabilise alan greenspan
highlighted u government willingness curb spending rising household saving
factor may help reduce late trading new york dollar reached euro thursday market
concern deficit hit greenback recent month friday federal reserve chairman mr
greenspan speech london ahead meeting g finance minister sent dollar higher
earlier tumbled back worse expected u job data think chairman taking much
sanguine view current account deficit taken time said robert sinche head
currency strategy bank america new york taking longer term view laying set
condition current account deficit improve year next worry deficit concern china
however remain china currency remains pegged dollar u currency sharp fall recent
month therefore made chinese export price highly competitive call shift beijing
policy fallen deaf ear despite recent comment major chinese newspaper time ripe
loosening peg g meeting thought unlikely produce meaningful movement chinese
policy meantime u federal reserve decision february boost interest rate quarter
point sixth move many month opened differential european rate half point window
believe could enough keep u asset looking attractive could help prop dollar
recent fall partly result big budget deficit well u yawning current account gap
need funded buying u bond asset foreign firm government white house announce
budget monday many commentator believe deficit remain close half trillion dollar
- business - 0'
[ ]:
[35]: # TF - IDF
[41]: tf_df
4
7 0.057341 0.057341 0.000000 0.085293 0.000000 0.000000 0.000000
8 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
9 0.000000 0.000000 0.000000 0.000000 0.071168 0.071168 0.071168
[ ]: