99青青草,在线第一页,日韩在线影院

DataSet是tensorflow 1.3版本推出的一個high-level的api，在1.3版本還只是處于測試階段，1.4版本已經正式推出。

在網上搜了一遍，發現關于使用DataSet加載文本的資料比較少，官方舉的例子只是csv格式的，要求csv文件中所有樣本必須具有相同的維度，也就是padding必須在寫入csv文件之前做掉，這會增加文件的大小。

經過一番折騰試驗，這里給出一個DataSet+TFRecords加載變長樣本的范例。

首先先把變長的數據寫入到TFRecords文件：

				?

									def writedata():

									 xlist = [[1,2,3],[4,5,6,8]]

									 ylist = [1,2]

									 #這里的數據只是舉個例子來說明樣本的文本長度不一樣，第一個樣本3個詞標簽1，第二個樣本4個詞標簽2

									 writer = tf.python_io.TFRecordWriter("train.tfrecords")

									 for i in range(2):

									  x = xlist[i]

									  y = ylist[i]

									  example = tf.train.Example(features=tf.train.Features(feature={

									   "y": tf.train.Feature(int64_list=tf.train.Int64List(value=[y])),

									   'x': tf.train.Feature(int64_list=tf.train.Int64List(value=x))

									  }))

									  writer.write(example.SerializeToString())

									 writer.close()

然后用DataSet加載：

				?

									feature_names = ['x']

									def my_input_fn(file_path, perform_shuffle=False, repeat_count=1):

									 def parse(example_proto):

									  features = {"x": tf.VarLenFeature(tf.int64),

									    "y": tf.FixedLenFeature([1], tf.int64)}

									  parsed_features = tf.parse_single_example(example_proto, features)

									  x = tf.sparse_tensor_to_dense(parsed_features["x"])

									  x = tf.cast(x, tf.int32)

									  x = dict(zip(feature_names, [x]))

									  y = tf.cast(parsed_features["y"], tf.int32)

									  return x, y

									 dataset = (tf.contrib.data.TFRecordDataset(file_path)

									    .map(parse))

									 if perform_shuffle:

									  dataset = dataset.shuffle(buffer_size=256)

									 dataset = dataset.repeat(repeat_count)

									 dataset = dataset.padded_batch(2, padded_shapes=({'x':[6]},[1])) #batch size為2，并且x按maxlen=6來做padding

									 iterator = dataset.make_one_shot_iterator()

									 batch_features, batch_labels = iterator.get_next()

									 return batch_features, batch_labels

									next_batch = my_input_fn('train.tfrecords', True)

									init = tf.initialize_all_variables()

									with tf.Session() as sess:

									 sess.run(init)

									 for i in range(1):

									  xs, y =sess.run(next_batch)

									  print(xs['x'])

									  print(y)